icebert / eccDNA_RCA_nanopore

eccDNA identification from nanopore long reads of rolling-circle amplicon
MIT License
5 stars 4 forks source link

Flec - full-length eccDNA caller

eccDNA identification from nanopore long reads of rolling-circle amplicon

DOI:10.1038/s41586-021-04009-w DOI:10.1038/s41596-022-00783-7

Dependency (python packages)

Usage

required arguments for input:
  --fastq <STR>         input reads in fastq format
  --paf <STR>           alignments file in PAF format generated by minimap2
  --reference <STR>     reference genome file in fasta format

required arguments for output:
  --info <STR>          output file for sequences information
  --seq <STR>           output file for consensus sequences in fasta format
  --var <STR>           output file for variants

optional arguments:
  --maxOffset <INT>     maximum offset of start/end positions between two sub-reads
                        to be considered as mapping to the same location [default: 20]
  --minMapQual <INT>    minimum mapping quality of sub-reads [default: 30]
  --minDP <INT>         minimum depth to call variants [default: 4]
  --minAF <FLOAT>       minimum alternative allele frequency to call variants [default: 0.75]

  --verbose             print details of consensus construction
  -h, --help            show help message

Output

The info file contains all meta information for each eccDNA identified, with 6 columns:

field description
readname The name (id) of each read generated by Nanopore
Nfullpass Number of full pass for this eccDNA covered by this read
Nfragment Number of fragment(s) (genomic location) that form this eccDNA
refLength The length of reference genome that this eccDNA was mapped
seqLength Actual sequence length of this eccDNA
fragments The origin of genomic location(s) for each fragment composing this eccDNA

When the Nfullpass is 0, it means no eccDNA identifed for this Nanopore read.

The coordinates in fragments are 1-based and inclusive. Multiple fragments are separated by |.

Example info file:

readname Nfullpass Nfragment refLength seqLength fragments
3561e493-0b99-4a11-a517-de5681276d82 0 0 0 0
8bac2bc4-9e1c-4804-97c7-1dd88184b2b8 1 1 1047 1047 chr5:144628101-144629147(+)
15ce164b-ef1f-42b2-af1f-9a10c3abf23b 10 1 1024 1024 chrX:145145309-145146332(-)
665d4815-998b-42be-8af7-9a2dc31157b3 5 2 628 627 chr10:91847836-91848275(+)|chr19:58942249-58942436(+)
02714a9a-3753-47e9-b770-8aa606856ecc 4 2 507 505 chr12:53934104-53934326(+)|chr12:86923760-86924043(-)

The seq file is the reconstructed full length sequence for each eccDNA in fasta format. The id for each sequence is the readname.

The var file contains the variants infered from the Nanopore reads compared to reference genome sequence, with 6 columns:

field description
1 chromosome
2 position in the reference genome (1-based)
3 reference nucleotide(s)
4 alternate nucleotides(s) '-' means deletion
5 supportive coverage depth
6 total coverage depth

Example var file:

col1 col2 col3 col4 col5 col6
chr5 125935640 G A 6 8
chr11 93766201 G - 4 4
chr17 17326883 A ATCT 5 5
chr17 45437665 GG - 3 4

Using --verbose is suggested and can be piped to a log file by | tee out.log, which will output the mapping structure of each Nanopore read. So users can check the details of each eccDNA constructed from the rolling circle amplication Nanopore read. Example output:

8f1bb745-b054-4c99-9609-489cf234ea90
#Fragment: 1    Full Pass: 3    Read Length: 6283

           24 - chr5:56004938-56005172 (-) -   253
          254 - chr5:56004933-56006381 (-) -  1721
         1722 - chr5:56004933-56006372 (-) -  3169
         3170 - chr5:56004937-56006372 (-) -  4598

Location:
        chr5:56004933-56006372 (-)
665d4815-998b-42be-8af7-9a2dc31157b3
#Fragment: 2    Full Pass: 5    Read Length: 3434

                                                            1 - chr19:58942250-58942436 (+) -   185
          186 - chr10:91847836-91848275 (+) -   626       627 - chr19:58942249-58942436 (+) -   813
          814 - chr10:91847836-91848275 (+) -  1235      1236 - chr19:58942249-58942435 (+) -  1429
         1430 - chr10:91847836-91848279 (+) -  1867      1868 - chr19:58942252-58942436 (+) -  2053
         2054 - chr10:91847836-91848275 (+) -  2481      2482 - chr19:58942249-58942433 (+) -  2671
         2672 - chr10:91847838-91848275 (+) -  3098      3099 - chr19:58942249-58942436 (+) -  3278
         3279 - chr10:91847836-91847993 (+) -  3434

Location:
        chr10:91847836-91848275 (+)     chr19:58942249-58942436 (+)

Example sequencing data

The example Nanopore reads of rolling-circle amplified eccDNA are available at

DOI:10.6084/m9.figshare.17046158.v1

Citation

Wang, Y., Wang, M., Djekidel, M.N. et al. eccDNAs are apoptotic products with high innate immunostimulatory activity. Nature 599, 308–314 (2021).

Wang, Y., Wang, M. & Zhang, Y. Purification, full-length sequencing and genomic origin mapping of eccDNA. Nat Protoc (2022)