Closed ssnn-airr closed 6 years ago
Original comment by Koda Hirotomo (Bitbucket: [Hirotomo Koda](https://bitbucket.org/Hirotomo Koda), ).
Now we've tried the kleinstein/immcantation:devel image and everything works. I also appreciate your helpful advice on the AlignSets tool. This also works well on our data.
Thank you very much for all of this!
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
Sure thing. I ran the new file you sent and it worked fine.
However, the consensus sequences are poor quality with a lot of Ns in them due to the abundance of indels. With --freg 0.6
BuildConsensus requires 60% of the reads to have the same non-N character or it'll assign an N. So you get a lot of situations like the following:
ID> CATTATTACGTCGTGG
SEQCOUNT> 9
INSEQ1> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGGTGACAGAGGTCATCCGCAGGGAGA
INSEQ2> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGTGACAGAGGTCATCCGCAGGGAGAA
INSEQ3> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGTGACAGAGGTCATCCGCAGGGAGAA
INSEQ4> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGTGACAGAGGTCATCCGCAGGGAGAA
INSEQ5> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGGTGACAGAGGTCATCCGCAGGGAGA
INSEQ6> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGGTGACAGAGGTCATCCGCAGGGAGA
INSEQ7> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGGTGACAGAGGTCATCCGCAGGGAGA
INSEQ8> TCGCACAGTAATAGACGGCCGTGTCCGCGACGTGACAGAGGTCATCCGCAGGGAGAAC
INSEQ9> CTCGCACAGTAATAGACGGCCGTGTCCGCGACGTGACAGAGGTCATCCGCAGGGAGAA
CONSENSUS> TCGCACAGTAATAGACGGCCGTGTCCGCGACGNNNNNNNNGNNNNNCNNNNGGNNNAN
QUALITY> ```````````!!!!!!!!{!!!!!{!!!!{{!!!{
Where a series of frameshifts causes a lot of positions to be Ns in the consensus.
I would run AlignSets-muscle on the files before running BuildConsensus.
The results looked a lot better when I did this (I'll email you example output separately).
Original comment by Koda Hirotomo (Bitbucket: [Hirotomo Koda](https://bitbucket.org/Hirotomo Koda), ).
Great. Just in case, I've sent you another example file which causes the same error. This time no N characters are included but there ARE some sequences with characters with quality 0, which passed QC step because average quality score is above the threshold. I hope you just try this file too. No problem for me on deadlines.
Thank you again!
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
Thanks. I got the test data. The problem was caused by some UMI groups having positions where the quality score was 0 in all of the sequences (the tail end was all N characters with quality 0).
I added a fix to the version of presto at the tip of the default branch in the bitbucket repo.
As it looks like you are using the docker container, I'm rebuilding the kleinstein/immcantation:devel
image right now, and you can get the fix by using that image. The image hasn't finished building yet. It should be done by 3pm your time.
I'll try to make an official release of presto with the fix sometime next week, but I have some testing to do on the MaskPrimers changes first.
Original comment by Koda Hirotomo (Bitbucket: [Hirotomo Koda](https://bitbucket.org/Hirotomo Koda), ).
Thank you for your quick reply! I've sent you an email with example files.
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
Greetings @hkoda,
I'll take a look at this. Would you be able to email me an example file that fails? (jason.vanderheiden at yale.edu).
My guess is that there's a sequence that's entirely N characters, or something similar, in the data that we aren't accounting for correctly. I'm not sure without looking at it though.
Original report by Koda Hirotomo (Bitbucket: [Hirotomo Koda](https://bitbucket.org/Hirotomo Koda), ).
We're trying pRESTO to process UMI-Barcoded Ion Torrent PGM BCR mRNA (400bp) (which is indel error-prone like Roche 454 pyrosequencing). Reads are single-ended and all reversely oriented (start with 12bp UMI with fixed nucleotides inserted, and then C-region primer sequences). We seem to have no problem in QC, UMI annotation and primer masking (both sides) steps.
During BuildConsensus step, frequently but not always (probably depends on datasets) this error happens in our run. Error cases always show 'float division by zero' messages. I have to manually kill the process after this error.
Enabling '--nproc 1' option made no improvement.
Here's our system environment. We're happy to provide more information if you need.