Some problems with output files - Githubissues

icebert / eccDNA_RCA_nanopore

eccDNA identification from nanopore long reads of rolling-circle amplicon

MIT License

5 stars 4 forks source link

Some problems with output files #1

Closed Leo152 closed 2 years ago

Leo152 commented 3 years ago

Hi, Excuse me, what are the formats of the three files output by eccDNA_RCA_nanopore?
Are there any output instructions?

thanks！

icebert commented 3 years ago

Hi Leo, thank you for your suggestion. I added descriptions of the output files in README. Hope this would be helpful.

Leo152 commented 3 years ago

Thank you very much. I'm looking at README. Another question. I saw this in "eccDNAs are Apoptotic products with high Innate immunostimulatory activity." The article said that the code for analysis could be obtained in GitHub, but I did not find it. May I ask whether this code can be made public? Can you provide a copy if you can make it public?

Thanks！

icebert commented 3 years ago

Hi Leo, all the code are provided in this repo. The code used for reads mapping is available at:

https://github.com/icebert/eccDNA_RCA_nanopore/tree/main/mapping

The code for analyzing eccDNA from Nanopore mapped reads is available at:

https://github.com/icebert/eccDNA_RCA_nanopore/blob/main/eccDNA_RCA_nanopore.py

Leo152 commented 3 years ago

Sorry to bother you, thank you very much for your reply

Leo152 commented 3 years ago

Hello, sorry to bother you again. What is the specific processing process and code of removing PCR duplicates, bedtools and karyoploteR in this section?

微信图片_20211105112621

icebert commented 3 years ago

For these analysis, I used commend line as described in the methods (your screenshot) with the tools mentioned. I don't have a script for these analysis.

Leo152 commented 3 years ago

Regarding the analysis of this paragraph, I should use the bam file output by minimap2 and not using the eccDNA_RCA_nanopore output file, right?

icebert commented 3 years ago

All the analysis are based on the eccDNA_RCA_nanopore output. The minimap2 output is just intermediate file. The eccDNA_RCA_nanopore output is the final results. Each line is an eccDNA identified (we only use Nfullpass>=2). For the genomic distribution, you only need the fragments column in the info file generated by eccDNA_RCA_nanopore. You can convert it to bed format then using uniq command to remove duplicates. Then bedtools genomecov -i <bed> -d -bg -g <chr.size> was used to gernerate coverage in bedgraph format and converted to bigwig format with bedGraphToBigWig. Then you can plot eccDNA distribution using karyoploteR with the bigwig file as input.

Leo152 commented 3 years ago

I see. Thank you for your reply

Leo152 commented 2 years ago

Hello, excuse me. In the info file, multiple reads like the one in the picture are all in one fragment, so they can be regarded as the same eccDNA molecule, right?
If I want to know how many eccDNA molecules there are in a sample, can I count them after fragment removal and repetition?

icebert commented 2 years ago

Yes, these are probably PCR duplicates

Leo152 commented 2 years ago

Is there a better duplicate removal software? I'm using uniq now, but it doesn't remove fragments that are several bp apart

icebert commented 2 years ago

I also used the 'uniq'. Currently almost all PCR removal tools including 'picard markduplicates' are solely based on coordinates.

Leo152 commented 2 years ago

May I ask if I understand this correctly? If the fragment is exactly the same, it is PCR repetition. If the fragment differs by a certain bp, it may be one of the reads that make up eccDNA .

icebert commented 2 years ago

Yes, your understanding is right.

Leo152 commented 2 years ago

I would like to ask whether the "Unique eccDNA" in this table was only removed from the PCR repeats, or not only the PCR repeats but also those fragments with a certain bp difference.
In my understanding, those fragments with a certain bp difference are the same eccDNA and should be removed, but they are not exactly the same and cannot be removed by software. The amount of data is also large and cannot be removed artificially, so I don't know how to deal with them

icebert commented 2 years ago

In this table, we used the 'uniq' to get unique eccDNAs. So only fragments with the same positions were treated as PCR duplicates.

Leo152 commented 2 years ago

May I ask if the reference genome "MM10Combine" in your analysis can be downloaded? I only found the experimental data on the Internet, and I would like to compare them to see which step of my analysis has problems.

icebert commented 2 years ago

The mm10combine reference genome can be downloaded at: https://figshare.com/ndownloader/files/31960676

The example dataset can be downloaded at: https://figshare.com/ndownloader/files/31526759

The dataset of the original paper can be downloaded at GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5058061

Leo152 commented 2 years ago

Many thanks

Leo152 commented 2 years ago

I would like to ask you a question. In the INFO file, unless the fragment is repeated, we think that a line is one eccDNA. However, after PCR amplification, the majority of eccDNA should be composed of multiple reads as shown in the following figure. But why is so much eccDNA composed of just one reads? Or why is it that most eccDNA has only one line in the INFO file and there is no fragment duplication?

icebert commented 2 years ago

We used Rolling Circle Amplification, which is totally different with PCR. In PCR, one fragments is amplified with multiple cycles, but RCA doesn't. And given that we have demonstrated that eccDNAs generates from apoptotic DNA fragments, so it origins quite randomly from anywhere of the genome. So it is normal that many eccDNAs only have one read.

icebert commented 2 years ago

In addition, you should only use the line with Nfullpass>=2