Closed Daghis closed 2 years ago
FYI, I think I've gotten the last few issues resolved since it's running seemingly successfully (if slowly) on hivevm193
.
If the issue is merely that the error messages being generated should be print()
'd as well as collected in the List[str]
that's returned, I'd prefer to address that by removing the verbose
argument and just having all messages be collected and printed.
How does that sound?
I was conflating self.errors
with local errors
variables. I believe I've corrected the logic so that the right value should be returned versus logged (when verbose is enabled... I left the argument in).
I've run a test with the fix to errors vs self.errors, and that did indeed fix the problem from the earlier test. I'm running another test to check on a hint of a second problem- I think an error string from verify_fastq_filename should produce a log message, but should not cause the dataset to be invalid. Ideally the regex that chooses what to scan could be less broad, and explicitly select .fastq, .fastq.gz, (and maybe .FASTQ.gz; I have seen those around), so that files like fastq.good.gz are never included in the scan in the first place. I'm thinking I need to do a PR against the test that scans generic .gz files, so that those scanned by this test aren't scanned again. One good way to do that mod would be to import the regex used by this test and exclude anything matching that regex.
Oh, that's right! I'd come across that regex in my searching and using, or at least replicating, that regex.
I'll take care of that tomorrow.
I've cleaned up that part by using fastq_utils
' code for finding the files in the first place, making that the canonical code for that. I suspect that my additional dependency to bring in fastq_utils
might cause an issue when running as a plugin. I'll need some guidance for how to correct that if that should come to pass.
Comment from @jswelling in Slack:
here's another delta for the FASTQ validator: when it doesn't find any FASTQ files to check, it needs to not set any errors. That way non-sequence datasets just skip this validation. I'll add that to the PR next time I touch it but I thought I'd let you know now. This was found by accident during David Betancur's experiments with running ingestions on TEST.
The most recent commit logs the no-files-found message, but doesn't return it as an error.
Inform future developers that working with this repository as standalone will not work and provide a reference to the parent repository.