jonathonthill / sangerseqR

git for the Bioconductor package "sangerseqR"
11 stars 10 forks source link

Unable to understand difference between two files throwing a warning #7

Open upgradedavid opened 4 months ago

upgradedavid commented 4 months ago

Hi Jonathon,

I've been using your fantastic package sangerseqR to import .ab1 files, but am quite clueless what causes the warning below that I have been observing only for one of the two provided files. I have used readsangerseq() for import.

"Invalid characters removed from primary basecalls. This may result in basecalls being shifted. Please check chromatogram."

It's quite surprising, given that they were obtained in the same experiment, with exactly the same settings etc. Do you know what's different between those two files and what might have caused it?It would be absolutely amazing if you could help me out here!

All the best, David

no_error.zip

jonathonthill commented 4 months ago

Hi David,

Unfortunately, we have never figured out why, but ab1 files sometimes have non-nucleotide characters in them. It is almost like the files get corrupted, as the characters are completely random. However, if we don’t remove them, we get errors. I wish I knew more, but this is just something we have observed.

Jonathon

On May 9, 2024, at 7:41 AM, David Vukovic @.***> wrote:

Hi Jonathon,

I've been using your fantastic package sangerseqR to import .ab1 files, but am quite clueless what causes the warning below that I have been observing only for one of the two provided files. I have used readsangerseq() for import.

"Invalid characters removed from primary basecalls. This may result in basecalls being shifted. Please check chromatogram."

It's quite surprising, given that they were obtained in the same experiment, with exactly the same settings etc. Do you know what's different between those two files and what might have caused it?It would be absolutely amazing if you could help me out here!

All the best, David

no_error.ziphttps://github.com/jonathonthill/sangerseqR/files/15263007/no_error.zip

— Reply to this email directly, view it on GitHubhttps://github.com/jonathonthill/sangerseqR/issues/7, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABSNKBJ4IXFODFCZW5463CTZBN4IRAVCNFSM6AAAAABHO3AAUGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DONZSGUYTEMY. You are receiving this because you are subscribed to this thread.Message ID: @.***>

upgradedavid commented 4 months ago

Thank you very much for your answer. So how might this odd inherent error in the .ab1 files influence my downstream analysis? Merely wrong representation in the graph function? Wrongly sangerseqR basecalling? Is the primary basecalling by the sequencing company affected? I am implementing a small app where users are presented your graphs to decide whether they can see a pattern, and manually resolve the ambiguity. So I was worried that peaks are shifted or similar.

upgradedavid commented 4 months ago

Hi Jonathon,

Thank you very much for your answer. So how might this odd inherent error in the .ab1 files influence my downstream analysis? Merely wrong representation in the graph function? Wrong sangerseqR basecalling? Is the primary basecalling by the sequencing company affected? I am implementing a small app where users are presented your graphs to decide whether they can see a pattern, and then manually resolve the ambiguity. So I was worried that peaks are shifted or similar?

All the best, David

On Thu, May 9, 2024 at 5:00 PM Jonathon Hill @.***> wrote:

Hi David,

Unfortunately, we have never figured out why, but ab1 files sometimes have non-nucleotide characters in them. It is almost like the files get corrupted, as the characters are completely random. However, if we don’t remove them, we get errors. I wish I knew more, but this is just something we have observed.

Jonathon

On May 9, 2024, at 7:41 AM, David Vukovic @.***> wrote:

Hi Jonathon,

I've been using your fantastic package sangerseqR to import .ab1 files, but am quite clueless what causes the warning below that I have been observing only for one of the two provided files. I have used readsangerseq() for import.

"Invalid characters removed from primary basecalls. This may result in basecalls being shifted. Please check chromatogram."

It's quite surprising, given that they were obtained in the same experiment, with exactly the same settings etc. Do you know what's different between those two files and what might have caused it?It would be absolutely amazing if you could help me out here!

All the best, David

no_error.zip< https://github.com/jonathonthill/sangerseqR/files/15263007/no_error.zip>

— Reply to this email directly, view it on GitHub< https://github.com/jonathonthill/sangerseqR/issues/7>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ABSNKBJ4IXFODFCZW5463CTZBN4IRAVCNFSM6AAAAABHO3AAUGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DONZSGUYTEMY>.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/jonathonthill/sangerseqR/issues/7#issuecomment-2102833576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKOEAKKXRWPGCDOT55A2DYTZBOFP7AVCNFSM6AAAAABHO3AAUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSHAZTGNJXGY . You are receiving this because you authored the thread.Message ID: @.***>

jonathonthill commented 4 months ago

Hi David,

Sounds like an interesting project. I would love to see it when you are finished.

It should essentially create a small deletion in the primary sequence called by the sequencing software. The Sanger graphing function will not know about it, so the letter may become shifted compared to the chromatogram. The case we had that led to the warning/fix being added to the package was at the end of the sequence, so we did not see any negative effects. However, we were not sure if that would always be the case.

Jonathon

On May 9, 2024, at 11:29 AM, David Vukovic @.***> wrote:

Hi Jonathon,

Thank you very much for your answer. So how might this odd inherent error in the .ab1 files influence my downstream analysis? Merely wrong representation in the graph function? Wrong sangerseqR basecalling? Is the primary basecalling by the sequencing company affected? I am implementing a small app where users are presented your graphs to decide whether they can see a pattern, and then manually resolve the ambiguity. So I was worried that peaks are shifted or similar?

All the best, David

On Thu, May 9, 2024 at 5:00 PM Jonathon Hill @.***> wrote:

Hi David,

Unfortunately, we have never figured out why, but ab1 files sometimes have non-nucleotide characters in them. It is almost like the files get corrupted, as the characters are completely random. However, if we don’t remove them, we get errors. I wish I knew more, but this is just something we have observed.

Jonathon

On May 9, 2024, at 7:41 AM, David Vukovic @.***> wrote:

Hi Jonathon,

I've been using your fantastic package sangerseqR to import .ab1 files, but am quite clueless what causes the warning below that I have been observing only for one of the two provided files. I have used readsangerseq() for import.

"Invalid characters removed from primary basecalls. This may result in basecalls being shifted. Please check chromatogram."

It's quite surprising, given that they were obtained in the same experiment, with exactly the same settings etc. Do you know what's different between those two files and what might have caused it?It would be absolutely amazing if you could help me out here!

All the best, David

no_error.zip< https://github.com/jonathonthill/sangerseqR/files/15263007/no_error.zip>

— Reply to this email directly, view it on GitHub< https://github.com/jonathonthill/sangerseqR/issues/7>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ABSNKBJ4IXFODFCZW5463CTZBN4IRAVCNFSM6AAAAABHO3AAUGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DONZSGUYTEMY>.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/jonathonthill/sangerseqR/issues/7#issuecomment-2102833576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKOEAKKXRWPGCDOT55A2DYTZBOFP7AVCNFSM6AAAAABHO3AAUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSHAZTGNJXGY . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/jonathonthill/sangerseqR/issues/7#issuecomment-2103099916, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABSNKBPYIOPWU63DJN5NCOTZBOW7JAVCNFSM6AAAAABHO3AAUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBTGA4TSOJRGY. You are receiving this because you commented.