Closed YibiChen closed 2 months ago
Yes, the integration tests all succeed except one. This is relevant to test_batch_recovery, where long read file is given, but long read type is "none". In my script (aviary/modules/assembly/scripts/spades_assembly.py 63-65), I raised a ValueError for this scenario whereas in the earlier rule this would be defaulted to pacbio. Is there any reason to set the default type as pacbio? I believe nanopore is more common.
I also have another concern about the tmpdir. In test_short_read_recovery_queue_submission, the tmpdir I got from qsub-logs is still /tmp, which is the $TMPDIR in the local env where I ran aviary. Is this correct? If I were to run bin chicken, would the aviary command be running on the remote node?
Yes, the integration tests all succeed except one. This is relevant to test_batch_recovery, where long read file is given, but long read type is "none". In my script (aviary/modules/assembly/scripts/spades_assembly.py 63-65), I raised a ValueError for this scenario whereas in the earlier rule this would be defaulted to pacbio. Is there any reason to set the default type as pacbio? I believe nanopore is more common.
This doesn't seem to make a whole lot of sense. The default long read type in the CLI is ont
, none
is not a valid option. You shouldn't be supplying "none" as a long read type in the first place. If you are asking why pacbio was the final branch in the if statement originally, then it's because there are 4 different pacbio read types and only 2 ont types so it's programatically easier to catch the first 2 ont types and then safely assume pacbio after that.
If you don't have long reads, then this rule shouldn't even be getting run
Yes, the integration tests all succeed except one. This is relevant to test_batch_recovery, where long read file is given, but long read type is "none". In my script (aviary/modules/assembly/scripts/spades_assembly.py 63-65), I raised a ValueError for this scenario whereas in the earlier rule this would be defaulted to pacbio. Is there any reason to set the default type as pacbio? I believe nanopore is more common.
This doesn't seem to make a whole lot of sense. The default long read type in the CLI is
ont
,none
is not a valid option. You shouldn't be supplying "none" as a long read type in the first place. If you are asking why pacbio was the final branch in the if statement originally, then it's because there are 4 different pacbio read types and only 2 ont types so it's programatically easier to catch the first 2 ont types and then safely assume pacbio after that.If you don't have long reads, then this rule shouldn't even be getting run
I am sorry I didn't explain this well. I just thought it would be nice to have more sanity check. A good example is the test case for batch submission (see test/data/example_batch.tsv). Although long read type
is "ont" for both samples here, it becomes "none" in the individual config.yaml files. This is likely causing by an index mistake somewhere. More importantly, the command would finish without any complains, even though spade
were running in pacbio
mode not nanopore
. This could also happen if the user misspelled "ont" or used uppercase.
I can resubmit the pull request without the sanity check if you think it's not nessesary or I can correct the current version with the correct pacbio codes. And thanks for all the comments, they are very helpful.
The choices for long-read-type are restricted by the cli (https://github.com/rhysnewell/aviary/blob/78321260f1f1dd8e527079ff23b3404ce7d55cae/aviary/aviary.py#L481), so typos will error immediately.
The choices for long-read-type are restricted by the cli (
https://github.com/rhysnewell/aviary/blob/78321260f1f1dd8e527079ff23b3404ce7d55cae/aviary/aviary.py#L481 ), so typos will error immediately.
I think I understand what Yibi is saying, but I don't exactly understand how the long_read_type
became none
. The batch input doesn't have the same safety check in place that the CLI does, users could potentially introduce a typo in their batch file if the user isn't writing the batch scripts to a file
But batch mode doesn't seem to have this restriction.
@rhysnewell maybe you can try to run the test case and figure it out?
My vote would be to add checks for batch mode, rather than worry at each stage if the variable is garbage. Though in a separate PR
@YibiChen I've pinpointed the issue here and it's an easy fix, but it will go in a separate PR. Just remove your pacbio check from your script and respond to the other comments, thanks
The integration tests all succeed. Let me know if you spot anything else.
Do the integration tests all succeed?