oschwengers / asap

A scalable bacterial genome assembly, annotation and analysis pipeline
https://doi.org/10.1371/journal.pcbi.1007134
GNU General Public License v3.0
66 stars 18 forks source link

characterization steps failed #10

Closed jagos01 closed 4 years ago

jagos01 commented 4 years ago

Hello, I am trying to analyze nanopore data with asap. Quality control, assembly and reference mapping finish while snp and scaffolding steps are aborted and all other steps fail (check fails). Are all the features supported with nanopore data? Any help would be appreciated. I have attached the log file if needed. Thanks, Scott asap.log

oschwengers commented 4 years ago

Hello @jagos01 , thanks for reaching out. For some reason the SNP detection and scaffolding steps fail while the QC and assembly step successfully finish. Hence, I'd say that the configuration seems to be ok. To dive a little bit deeper, could you please provide the *.stdout.log within the snps directory as well as stdout.log from one subdirectory within the scaffolds dir so that I can check what exactly happens? Best regards

jagos01 commented 4 years ago

Hello, Thank you for your help. Attached are the logs you requested. Scott BaS.std.log std.log

oschwengers commented 4 years ago

Hi, due to the logs it seems that you have provided the reference GCF_000832635.1_ASM83263v1.fasta twice. Could you please check if this is true? You can provide more than 1 reference, but these should be unique.

jagos01 commented 4 years ago

I included two files for the reference, .fasta and .gff both with the same prefix. should only the fasta file be used for the reference? Thanks, Scott

oschwengers commented 4 years ago

Yes, exactly. You can provide reference genomes either as GenBank, EMBL or Fasta files. GFF + Fasta is not supported. If available, I'd vote for GenBank or EMBL as this way, called SNPs can be annotated. I'll close this issue; feel free to reopen it, just in case the issue remains. Best regards!

jagos01 commented 4 years ago

After changing the reference to a GenBank file SNP detection completed (although no snps were detected?). However, scaffolding still failed and abr, annotations, mlst, taxonomy and vf detection were skipped (check failed?). I have attached the asap.log and stdout.log from the scaffolding directory. asap.log std.log Thanks, Scott

oschwengers commented 4 years ago

Due to your logs, the scaffolding step aborts because MeDuSa is not able to find sufficient synteny information between your sample's contigs and the provided reference genome. Also, the mapping process only took ~25 sec, the SNP detection step only 5 sec. I'd assume that maybe your reference is not sufficiently closely related to your sample? Could you test this, for instance, via Mash?

jagos01 commented 4 years ago

This run used a subsample of my nanopore data (50K reads). Unicycler was able to assemble it into 2 contigs. Using the assembled fasta and ref sequence from the scratch directory gave a mash value of 0.000705841. My input is a nanopore fastq file, do I need any other files for input? Also, is it possible to use an existing assembly as input for snp, abr, mlst and vf detection?
Thanks, Scott

jagos01 commented 4 years ago

However when I run the subsampled fastq file I get a mash value of 0.11676. Do I need to use error corrected reads?

oschwengers commented 4 years ago

For Nanopore data, ASA³P internally also uses Unicycler, so the assembly itself should not be a problem. If your assembly results in a such a low mash value against your reference, then the reference should actually be OK, as well. This seems rather odd and I'm a little bit puzzled. Along with the Nanopore reads, you can provide Illumina reads if you have. But running ASA³P with Nanopore reads only is absolutely fine, though I would deem the SNP detection somewhat meaningless for Nanopore reads due to the low base quality.

You could try to run ASA³P on your Nanopore assembly. Then you'd skip the SNP detection but get the results for ABR, MLST, etc... Please, provide the Nanopore assembly as contigs-ordered in the template in order to skip the scaffolding step.

I'm looking forward to your results. Just in case, I could deep dive a little bit further on your data if it'd be OK for you to provide them somehow.

Best regards

jagos01 commented 4 years ago

I was able to run ASA³P with the Nanopore assembly using contigs-ordered. Using the raw reads with this data set still gave errors, however I tried 2 other Nanopore data sets and both worked fine with just the raw reads. I am not sure what the problem is with the 1st data set. Thank you for all your help. Scott

oschwengers commented 4 years ago

You're welcome and thanks for reaching out with this. Sometimes, it is hard to remotely debug such cases as you never know what's in the data. I'm glad that the pipeline works for your other data sets (and hopefully holds up to your expectations). Best regards!