mrmckain / Fast-Plast

Automated de novo assembly of whole chloroplast genomes.
MIT License
43 stars 14 forks source link

Begin Extensions / Fuse contigs loop #56

Open kroeve opened 1 year ago

kroeve commented 1 year ago

Hey there!

In some cases I experience a never ending loop of Begin Extensions / Fuse contigs:

0:12:03 Begin Extensions
0:12:08 Fuse contigs
0:12:08 Begin Extensions
0:12:13 Fuse contigs
[...]
65:45:15    Begin Extensions
67:01:03    Fuse contigs
67:01:03    Begin Extensions
68:22:04    Fuse contigs
68:22:04    Begin Extensions

And so on. Only thing I can do then is to cancel the run. The ...error.log file only contains:

Building a SMALL index
8523560 reads; of these:
  8523560 (100.00%) were paired; of these:
    8339894 (97.85%) aligned concordantly 0 times
    169 (0.00%) aligned concordantly exactly 1 time
    183497 (2.15%) aligned concordantly >1 times
    ----
    8339894 pairs aligned concordantly 0 times; of these:
      93 (0.00%) aligned discordantly 1 time
    ----
    8339801 pairs aligned 0 times concordantly or discordantly; of these:
      16679602 mates make up the pairs; of these:
        16615296 (99.61%) aligned 0 times
        664 (0.00%) aligned exactly 1 time
        63642 (0.38%) aligned >1 times
2.53% overall alignment rate

Currently, I am retrying my run to find out weather this problem occurs always on the same sample and will add an answer here as soon as I have more information!

Cheers, Evelin

Update: It seems to get stuck on certain samples, but this again depends on the settings used. Currently, I am doing two Fast-Plast runs with different settings and each setting run got stuck two times on the same sample, but those are different samples between those runs. In that case, one setting run got stuck on one sample whereas the other run processed it, but the afin stage where it gets stuck took ~40 minutes.

mrmckain commented 1 year ago

Hi Evelin,

I have not seen this issue before. afin should stop after a set number of loops.

Can you send the settings you are using? Options passed to Fast-Plast do not impact afin like this, but I am curious as to what is happening.

Best, Michael

mrmckain commented 1 year ago

Also, do you notice anything different about the samples? Read length? Composition? Contamination? Read number?

kroeve commented 1 year ago

Hey Michael,

thanks for answering! The two sets of settings I have used are:

./fast-plast.pl -1 forward_paired.fq -2 reverse_paired.fq -n prefix --bowtie_index Sapindales --coverage_analysis --threads 8 --clean deep --adapters TruSeq

./fast-plast.pl -1 forward_paired.fq -2 reverse_paired.fq -n prefix --bowtie_index Sapindales --coverage_analysis --threads 8 --skip trim --clean deep

I'm working with Target Enrichment data of the Angiosperm353 baits set, thereby trying to assemble plastomes from the off-target reads, so the number of plastid reads varies a lot between the samples. For example for one sample where the option without trimming ended up in a loop contains 246568 plastid paired reads. But for another sample with only 40407 plastid paired reads, Fast-Plast was able to produce assemblies in both runs. For both samples the mean read length is 150 bp and this should also be the case for the other samples.

As I'm comparing different assemblers for my approach, I also used GetOrganelle and NOVOPlasty and I haven't noticed problems with the samples which ended up in a loop. Altogether there were only 5 samples out of 166 which ended up in a loop.

If I remember correctly I saw this problem in the closed issues, but the user wasn't answering you anymore so the problem could not be solved. I hope this little information helps you, if you need to know anything else I'm happy to help, I just had a very quick look at my data. I can also upload anything of the log files, just let me know.

Cheers, Evelin

mrmckain commented 1 year ago

Hi Evelin,

If you would post the logs, that would be great. Additionally, if you would email me, it would be great if I could test one of your problem data sets.

My email is mrmckain at Ua dot edu