Closed GoogleCodeExporter closed 8 years ago
I have attached the parameter file i am using to simulate the reads.
best,
thomas
Original comment by T.Bonf...@googlemail.com
on 2 Sep 2011 at 7:23
Attachments:
bug when generating FASTA/FASTQ sequences occurs when read identifiers are
sufficiently long. Ensembl transcript identifiers are comparatively long, and
because the transcript identifier is part of the read ID / FASTA tag, the issue
occurs in the given dataset.
Original comment by gmicha@gmail.com
on 2 Sep 2011 at 2:33
Hi Micha,
thank you for your fast reply and analysis. However, I think the problem must
be somewhere else. I have mapped the ensembl transcript ids to a set of integer
values and replaced the original ids by them, but this doesn't help...
Cheers,
thomas
Original comment by T.Bonf...@googlemail.com
on 5 Sep 2011 at 11:41
Hi Thomas,
reconstructing the issue you described, I definitely came across a bug in the
code--and removing the erroneous lines also made the problem disappear. The
circumstances that provoked the error are difficult to predict in the general
case--it was related to an overflow in a buffer used during FASTA/FASTQ
creation; the length of the read identifier certainly had an influence on the
aberrant behavior, therefore the length of ensembl identifiers was for me the
closest explanation that we haven't noted a problem before.
However, the example you provided works well for us with the bundle we just put
in the download section:
http://fluxcapacitor.googlecode.com/files/fbi.genome.simulator-1.0-RC4.tar.gz
Therefore I mark the issue as fixed, please notify me if you have contradicting
information.
cheers, micha
Original comment by gmicha@gmail.com
on 5 Sep 2011 at 12:41
Hi Micha,
the update works, thank you for your help! :)
best,
t
Original comment by T.Bonf...@googlemail.com
on 5 Sep 2011 at 2:43
Original issue reported on code.google.com by
gmicha@gmail.com
on 1 Sep 2011 at 12:22