Using Bio-Tradis to analyze mariner-transposon library

minisciencegirl commented 6 years ago

I am currently using Bio-Tradis to analyze several mariner-transposon libraries. The tranposon genome junctions are PCR amplified up using P5-transposon-specific primer and P7-common-hexamer primer. The sequencing of this low-diversity library is done by spiking in with other high-diversity samples, rather than using the dark cycles.

I am following the Tradis protocol running bacteria_tradis, followed by tradis_gene_insert_sites. The transposon tag that I am using is "CCGGGGACTTATCAGCCAACCTGT".

Mariner transposons insert into TA sites in the genome. One issue that I am encountering is that I am getting more insertions (ins_counts) per gene than the number of TA sites present in the gene. What's is going on?

Another issue that I have is that I am having trouble opening the plot files directly in Artemis. Is there another way to visualize this (i.e. using IGV)?

I've attached my fastq.stats file from one Tradis run. fastq.stats.txt

lbarquist commented 6 years ago

Hi,

Glad to hear the sequencing seems to have worked. What exactly is the problem you're having with Artemis? You should be able to load these by first opening your genome annotation (embl or genbank should work), then going to Graph > Add User Plot, and selecting your plot file. If this isn't working, you can check that the files are being generated correctly -- if you unzip them, they should be two columns of numbers, showing the number of insertions on each strand. In principle this format isn't too different from the wiggle format IGV uses, but you'd have to add the appropriate headers with the sequence metadata.

Regarding having insertions at non-TA sites, I suspect the problem might be due to slippage either in the RT or sequencing. We and others (see the methods section here: http://mbio.asm.org/content/2/1/e00315-10.full#sec-8) have observed that occasionally you get 'satellite' insertions around very high abundance insertion sites. This is presumably due to small indel errors introduced during library prep or amplification. If this is the case, I would expect the insertion counts at non-AT sites to be fairly low, and in the vicinity of high count AT insertion sites. There are a couple approaches you could take to fixing this if this is the case: just drop non-AT insertions, or add these satellite read counts to the count for a nearby genuine AT insertion site as Gallagher et al. have done. We don't have anything written to do this automatically, so would take some scripting.

Hope this helps, Lars

lumoswillow commented 5 years ago

Hello,

I am also having this issue with mariner libraries. I managed to stop false positive insertions by adjusting smalt_y to 1.

I have not managed to solve the issue with the plot files. When I try to open in artemis I get the error message ''error while reading user data read formatexception too many values in input file''. Unzipping the file shows two columns of mainly zeros. Do we know what could be going on here? I have managed to generate a list of essential genes, but I would like to be able to visualize insertions.

:)

lbarquist commented 5 years ago

Not sure, I've never encountered this. You should have the same number of rows in your plot file as you have positions in the genome sequence you're working with. Do these match?

unawareK commented 4 years ago

Hi, I'm also having issues with the plot file. I get the same error:

"error while reading user data: uk.ac.sanger.artemis.io.ReadFormatException: too many values in input file"

Has anybody found a way around it? Thanks!

andrewjpage commented 4 years ago

Could you attach the plot file which is causing the error?

On Tue, 12 Nov 2019, 04:56 unawareK, notifications@github.com wrote:

Hi, I'm also having issues with the plot file. I get the same error:

"error while reading user data: uk.ac.sanger.artemis.io.ReadFormatException: too many values in input file"

Has anybody found a way around it? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Bio-Tradis/issues/86?email_source=notifications&email_token=AAAF4VZ7D36GRTSK4O7OPATQTIZPRA5CNFSM4EVNC7C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDZA33Q#issuecomment-552734190, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAF4V2OZLODRAH5XGNWWU3QTIZPRANCNFSM4EVNC7CQ .

unawareK commented 4 years ago

Hi, here is the plot file. Thanks!

A6381_S4_L001_R1_001.out.gz.CP022537.insert_site_plot.gz

andrewjpage commented 4 years ago

The file looks perfect. This is an Artemis error rather than an error with Bio-TraDIS. The Artemis developers @puethe @kpepper might be able to assist.

SioStef commented 4 years ago

Same problem here when reading the plot files with Artemis. For some reason my plot file contains more rows than the sites of the sequence used. I used the bacteria_tradis pipeline on a multi-replicon genome and the same happens for some of the replicons. I guess is a Bio-TraDIS error rather than Artemis. Is it normal to have plots with more rows than the actual sequence length? Thanks in advance for any help on this.

sanger-pathogens / Bio-Tradis

Using Bio-Tradis to analyze mariner-transposon library #86