Closed schorlton closed 1 year ago
Hi
this may be a strange thing to assess but would you be able to try running the same thing without --format fastq
(let falco figure out the format) and let us know if you have the same problem? May help us with zero in on the problem
thank you!
Thanks for your quick response! Running without --format fastq
hit the same problem. I also tried running on the ungzipped file without --format fastq
and that also hit the same issue. Thanks!
Hm that's very strange indeed. I had a suspicion that --format fastq
was forcing the program to interpret the gzipped file as uncompressed, thus leading to problems.
would it be too much to ask if you could either share the entire dataset or a small subset that causes the problem on your machine? I'd be very interested in seeing if I can reproduce too, this may be more complicated than I thought. I understand that it's tricky to ask those things for private data so no worries if you cannot, or if it's more convenient you can always send it to me at desenabr[at]usc[dot]edu.
finally - and this is admittedly a hail mary - does the program also fail if you don't use any flags? Just falco nanopore.fastq.gz
?
Thank you!
@schorlton If you are only able to share part of the dataset, please include something with the same structure for the read names unless they are really short and contiguous. Also, try to include any special characters that would be in the original file, for example by obtaining the smaller part with command line tools. Also, the most helpful part would have similar variability in lengths of reads. Thanks!
Running falco nanopore.fastq.gz
hits the same issue.
I've managed to reproduce on the first random public nanopore sequencing file I pulled from SRA:
SRR21529942.fastq.gz
Looking forward to your analysis.
I can report that, using the provided file:
If @guilhermesena1 notices the same, we will need to find the problem and update conda. In the meantime, @schorlton if you are in an environment where you can build falco
from source, maybe you could try that using the source here. If you want to do that and encounter problems we can try to help. Otherwise we hope you'll wait for the fix on conda, which we will try to get done asap.
Glad I'm not going crazy :stuck_out_tongue_closed_eyes: Thanks for your investigation - I'll stay tuned for a conda fix as I can use FastQC in the interim. -Sam
Problem identified. Hopefully we can have a fix soon on conda. If you need it before we can do that, you'll have to try building from source -- preferably the release I linked above.
The problem can also be reproduced on a clone/release if we delete/rename the files in the Configuration
directory, so there may be something to do with the defaults.
@schorlton if you need a quick fix you can point to the configuration files in conda as follows
$ falco -c /path/to/your/conda/opt/falco-1.2.0/Configuration/contaminant_list.txt \
-a /path/to/your/conda/opt/falco-1.2.0/Configuration/adapter_list.txt \
-l /path/to/your/conda/opt/falco-1.2.0/Configuration/limits.txt \
[the rest of your parameters here]
I think when compiling falco internally through conda (since they only leave the binary), they're still not finding the Configuration folder under /opt
. The way fastqc does it is compiling everything at the conda root directory, moving everything to opt
then making a symlink under the bin
directory that points to the binary under opt
. I didn't like that approach very much and was trying to avoid it, but I guess if we want to emulate their behavior, we will emulate their behavior. We'll try to fix this but this may take a bit longer since it's done through a pull request.
I think I found the source of the problem and pushed a possible fix at bca5f11
When we used the default values the shortest adapter size wasn't being set properly, and decreasing 1 lead to underflow. You should also be able to run the code just using the -a
flag in the command above. In any case we'll roll this into a new release asap and update on conda.
Sorry for the inconvenience and thank you again for sharing the issue with us!
Thanks for the quick fix! Will test out the conda update once released.
Unfortunately now getting a new error on 1.2.1 from bioconda:
falco -skip-summary -skip-report nanopore.fastq.gz
[limitst] using file /opt/conda/opt/falco-1.2.1/Configuration/limits.txt
instruction for limit duplication not found in file /opt/conda/opt/falco-1.2.1
yep trying to figure this one out. what's weird is it is reading the correct file, and even more, so if you do
falco -l opt/conda/opt/falco-1.2.1/Configuration/limits.txt [etc etc]
it runs ok, but not if you don't specify the file, even if the URL is correct.
alright, I know better than to get my hopes up, but we made some changes to the most recent release and merged them to conda to address the issue.
As far as I can tell, a conda download of falco puts the config files at opt/falco/Configuration
and the falco binary in conda correctly uses these files to process the data. @schorlton if you are still having issues and are still interested in testing the most recent changes, we would love to know if things are running correctly on your end too. Just bear in mind you may have to run conda remove falco
prior to updating.
It works - thank you! The only remaining issue is specifying the file format. Specifying --format fastq
when the file is compressed FASTQ causes a crash. Omitting this works.
We will create another issue to address this. Ideally fq and fq.gz should be treated equally, but the reading and process is a bit different, and technically calling --format fastq
reports on the output that it is being read as uncompressed and failing makes sense since the input is technically malformatted. Regardless this does warrant clarification
I'll close this issue considering that the initial problem has been resolved, but any further problems regarding the stalling can always be added by reopening this.
Thanks for falco!
Running v1.2.0 installed from bioconda on a nanopore FASTQ
It looks like it generates the
fastqc_data.txt
properly, but then after that, it consumes over 32GB of RAM over several minutes until I kill it... Is this expected? For comparison, FastQC processes the file in 18s within 1GB RAM.Here are the stats of the file: