Closed lennijusten closed 8 months ago
@lennijusten can you resolve conflicts with current main
, and then ask me to review again?
(note that when it says I force-pushed that's actually you force pushing from a deploy key you asked me to add :( )
@jeffkaufman Fixed! Can you review?
In past commits to main I added a feature to
ribofrac()
that counted the total reads intotal_reads_dict
and returnednp.nan
if the sample had zero reads. However, when running the new NAO data throughribofrac()
, several samples returnednp.nan
even though they definitely has reads in them.Once possible cause of this issue is that the files where somehow empty or not appropriately copied over from AWS. I've added a
file_integrity_check()
function that checks potential inputs (minus the ".settings" and ".discarded" files) to see if 1) the path exists after copying down from AWS, and 2) if the files contain any reads.If all the potential inputs do not contain any reads or don't exist, the
ribofrac()
will skip the sample and not output anything to AWS. This seems better than outputtingnp.nan
without being certain that files were correctly processed, potentially misleading people. It also allows the pipeline to be re-run without the appearance of existing, potentially incorrect, ribofrac entries.