wrf / genomeGTFtools

convert various features into a GFF-like file for use in genome browsers
69 stars 27 forks source link

microsynteny Name attribute change #10

Closed mictadlo closed 3 years ago

mictadlo commented 4 years ago

Hi all, I ran microsynteny.py -b no_remark.out -q Q.no_remark.mRNA.gff3 -d NbR.no_remark.mRNA.gff3 -D "|" --make-gff > Q-NbR.no_remark.mRNA.microsynteny.gff3

Diamond's BLASTp output:

NBqld01G09150.1 NBlab01G14990.1 100.0   209     0       0       1       209     409     617     8.5e-123        438.3
NBqld01G09150.1 NBlab01G14990.3 100.0   209     0       0       1       209     409     617     8.5e-123        438.3
NBqld01G09150.1 NBlab02G09300.1 95.7    209     9       0       1       209     191     399     4.1e-117        419.5
NBqld01G09150.1 NBlab02G09300.2 95.7    209     9       0       1       209     393     601     4.1e-117        419.5
NBqld01G09150.1 NBlab05G10170.1 42.3    182     104     1       1       181     269     450     8.1e-41 166.0
NBqld01G09150.1 NBlab16G10960.1 41.5    212     109     3       1       206     251     453     1.1e-40 165.6
NBqld01G09150.1 NBlab03G05570.2 40.6    197     108     2       1       197     418     605     2.4e-40 164.5
NBqld01G09150.1 NBlab15G04360.1 42.4    191     101     2       4       194     335     516     3.1e-40 164.1
NBqld01G09150.1 NBlab06G09770.1 40.9    193     105     2       1       193     538     721     1.2e-39 162.2
NBqld01G09150.1 NBlab06G09780.1 40.9    193     105     2       1       193     538     721     1.2e-39 162.2
NBqld01G09150.1 NBlab03G17680.1 42.8    194     102     3       1       194     317     501     2.0e-39 161.4
NBqld01G09150.1 NBlab03G17690.1 42.8    194     102     3       1       194     316     500     2.0e-39 161.4
NBqld01G09150.1 NBlab04G16080.1 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.1 NBlab04G16080.2 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.1 NBlab04G16080.3 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.1 NBlab04G16080.4 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.1 NBlab06G17770.1 42.8    194     102     2       1       194     391     575     2.6e-39 161.0
NBqld01G09150.1 NBlab19G04850.1 41.8    182     105     1       1       181     77      258     3.4e-39 160.6
NBqld01G09150.1 NBlab02G04500.1 41.0    183     104     1       1       183     294     472     5.8e-39 159.8
NBqld01G09150.1 NBlab12G07510.1 41.8    194     104     2       1       194     369     553     7.6e-39 159.5
NBqld01G09150.1 NBlab12G20100.1 40.0    200     109     3       1       200     493     681     9.9e-39 159.1
NBqld01G09150.1 NBlab12G20110.1 40.0    200     109     3       1       200     72      260     9.9e-39 159.1
NBqld01G09150.1 NBlab05G19490.1 41.7    199     101     3       5       201     552     737     1.3e-38 158.7
NBqld01G09150.1 NBlab05G19490.2 41.7    199     101     3       5       201     490     675     1.3e-38 158.7
NBqld01G09150.1 NBlab05G19490.3 41.7    199     101     3       5       201     552     737     1.3e-38 158.7
NBqld01G09150.2 NBlab01G14990.1 100.0   209     0       0       1       209     409     617     8.5e-123        438.3
NBqld01G09150.2 NBlab01G14990.3 100.0   209     0       0       1       209     409     617     8.5e-123        438.3
NBqld01G09150.2 NBlab02G09300.1 95.7    209     9       0       1       209     191     399     4.1e-117        419.5
NBqld01G09150.2 NBlab02G09300.2 95.7    209     9       0       1       209     393     601     4.1e-117        419.5
NBqld01G09150.2 NBlab05G10170.1 42.3    182     104     1       1       181     269     450     8.1e-41 166.0
NBqld01G09150.2 NBlab16G10960.1 41.5    212     109     3       1       206     251     453     1.1e-40 165.6
NBqld01G09150.2 NBlab03G05570.2 40.6    197     108     2       1       197     418     605     2.4e-40 164.5
NBqld01G09150.2 NBlab15G04360.1 42.4    191     101     2       4       194     335     516     3.1e-40 164.1
NBqld01G09150.2 NBlab06G09770.1 40.9    193     105     2       1       193     538     721     1.2e-39 162.2
NBqld01G09150.2 NBlab06G09780.1 40.9    193     105     2       1       193     538     721     1.2e-39 162.2
NBqld01G09150.2 NBlab03G17680.1 42.8    194     102     3       1       194     317     501     2.0e-39 161.4
NBqld01G09150.2 NBlab03G17690.1 42.8    194     102     3       1       194     316     500     2.0e-39 161.4
NBqld01G09150.2 NBlab04G16080.1 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.2 NBlab04G16080.2 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.2 NBlab04G16080.3 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.2 NBlab04G16080.4 39.6    197     110     2       1       197     418     605     2.6e-39 161.0
NBqld01G09150.2 NBlab06G17770.1 42.8    194     102     2       1       194     391     575     2.6e-39 161.0
NBqld01G09150.2 NBlab19G04850.1 41.8    182     105     1       1       181     77      258     3.4e-39 160.6
NBqld01G09150.2 NBlab02G04500.1 41.0    183     104     1       1       183     294     472     5.8e-39 159.8
NBqld01G09150.2 NBlab12G07510.1 41.8    194     104     2       1       194     369     553     7.6e-39 159.5
NBqld01G09150.2 NBlab12G20100.1 40.0    200     109     3       1       200     493     681     9.9e-39 159.1
NBqld01G09150.2 NBlab12G20110.1 40.0    200     109     3       1       200     72      260     9.9e-39 159.1
NBqld01G09150.2 NBlab05G19490.1 41.7    199     101     3       5       201     552     737     1.3e-38 158.7
NBqld01G09150.2 NBlab05G19490.2 41.7    199     101     3       5       201     490     675     1.3e-38 158.7
NBqld01G09150.2 NBlab05G19490.3 41.7    199     101     3       5       201     552     737     1.3e-38 158.7
NBqld01G09160.1 NBlab02G09310.1 94.6    112     6       0       1       112     1       112     3.2e-56 216.5
NBqld01G09160.1 NBlab01G15000.1 96.4    112     0       1       1       112     1       108     7.2e-56 215.3
NBqld01G09160.1 NBlab02G09310.3 84.8    112     5       1       1       112     1       100     8.0e-47 185.3
NBqld01G09160.3 NBlab01G15000.1 97.5    203     1       1       1       203     1       199     2.0e-105        380.6
NBqld01G09160.3 NBlab02G09310.1 94.0    201     12      0       1       201     1       201     1.2e-102        371.3
NBqld01G09160.3 NBlab02G09310.3 86.7    128     5       1       1       128     1       116     4.3e-55 213.4
NBqld01G09160.5 NBlab02G09310.1 94.6    112     6       0       1       112     1       112     3.2e-56 216.5
NBqld01G09160.5 NBlab01G15000.1 96.4    112     0       1       1       112     1       108     7.2e-56 215.3
NBqld01G09160.5 NBlab02G09310.3 84.8    112     5       1       1       112     1       100     8.0e-47 185.3
NBqld01G09170.1 NBlab02G09320.1 98.0    149     3       0       1       149     1       149     8.0e-83 305.1
NBqld01G09170.1 NBlab02G09320.2 98.0    149     3       0       1       149     1       149     8.0e-83 305.1
NBqld01G09170.1 NBlab02G09320.3 98.0    149     3       0       1       149     1       149     8.0e-83 305.1
NBqld01G09170.1 NBlab02G09320.4 98.0    149     3       0       1       149     1       149     8.0e-83 305.1
NBqld01G09170.1 NBlab19G02340.1 89.0    145     16      0       5       149     4       148     3.0e-74 276.6
NBqld01G09170.1 NBlab19G02340.2 89.0    145     16      0       5       149     4       148     3.0e-74 276.6
NBqld01G09170.1 NBlab05G15330.1 88.3    145     17      0       5       149     4       148     9.8e-73 271.6
NBqld01G09170.1 NBlab05G15330.2 88.3    145     17      0       5       149     4       148     9.8e-73 271.6
NBqld01G09170.1 NBlab01G15010.1 96.7    90      3       0       60      149     1       90      9.2e-47 185.3
NBqld01G09180.1 NBlab02G09330.2 92.1    114     9       0       10      123     13      126     6.4e-57 218.8
NBqld01G09180.1 NBlab02G09330.3 92.1    114     9       0       10      123     13      126     6.4e-57 218.8
NBqld01G09180.1 NBlab01G15020.1 100.0   104     0       0       14      117     1       104     2.7e-55 213.4
NBqld01G09180.1 NBlab01G15020.2 100.0   104     0       0       14      117     1       104     2.7e-55 213.4
NBqld01G09180.1 NBlab02G09330.1 96.3    108     4       0       10      117     13      120     3.5e-55 213.0
NBqld01G09180.2 NBlab01G15020.1 100.0   259     0       0       5       263     1       259     6.0e-150        528.9
NBqld01G09180.2 NBlab02G09330.1 94.7    263     14      0       1       263     13      275     1.4e-143        507.7
NBqld01G09180.2 NBlab01G15020.2 100.0   157     0       0       5       161     1       157     9.1e-90 328.9
NBqld01G09180.2 NBlab01G15020.3 98.7    158     2       0       106     263     6       163     1.9e-87 321.2
NBqld01G09180.2 NBlab02G09330.2 96.3    108     4       0       1       108     13      120     1.2e-54 212.2
NBqld01G09180.2 NBlab02G09330.3 96.3    108     4       0       1       108     13      120     1.2e-54 212.2
NBqld01G09180.2 NBlab07G01870.1 46.7    214     112     2       47      258     79      292     8.3e-51 199.5

GFF Query:

NBqld01 transdecoder    mRNA    75913489        75916784        .       +       .       ID=NBqld01G09150.1;Note=Pentatricopeptide repeat-containing protein At1g26900%2C mitochondrial;Parent=NBqld01G09150
NBqld01 transdecoder    mRNA    75913489        75916116        .       +       .       ID=NBqld01G09150.2;Note=Pentatricopeptide repeat-containing protein At1g26900%2C mitochondrial;Parent=NBqld01G09150
NBqld01 transdecoder    mRNA    75918271        75922448        .       -       .       ID=NBqld01G09160.1;Note=Ran guanine nucleotide release factor;Parent=NBqld01G09160
NBqld01 transdecoder    mRNA    75918532        75922421        .       -       .       ID=NBqld01G09160.3;Note=Ran guanine nucleotide release factor;Parent=NBqld01G09160
NBqld01 transdecoder    mRNA    75918532        75922448        .       -       .       ID=NBqld01G09160.5;Note=Ran guanine nucleotide release factor;Parent=NBqld01G09160
NBqld01 transdecoder    mRNA    75924269        75929750        .       +       .       ID=NBqld01G09170.1;Note=Cullin-3B;Parent=NBqld01G09170
NBqld01 transdecoder    mRNA    75932007        75936883        .       -       .       ID=NBqld01G09180.1;Note=haloacid dehalogenase-like hydrolase domain-containing protein 3;Parent=NBqld01G09180
NBqld01 transdecoder    mRNA    75932157        75936856        .       -       .       ID=NBqld01G09180.2;Note=Glyceraldehyde 3-phosphate phosphatase;Parent=NBqld01G09180

GFF Target

NbV1Ch01        transdecoder    mRNA    90446162        90449165        .       +       .       ID=NBlab01G14990.1;Note=Pentatricopeptide repeat-containing protein At1g26900%2C mitochondrial;Parent=NBlab01G14990
NbV1Ch01        transdecoder    mRNA    90446162        90449165        .       +       .       ID=NBlab01G14990.3;Note=Pentatricopeptide repeat-containing protein At1g26900%2C mitochondrial;Parent=NBlab01G14990
NbV1Ch01        transdecoder    mRNA    88248079        88254983        .       -       .       ID=NBlab01G14500.1;Note=Heat shock protein 90-5%2C chloroplastic;Parent=NBlab01G14500
NbV1Ch01        transdecoder    mRNA    88248079        88254983        .       -       .       ID=NBlab01G14500.2;Note=Heat shock protein 90-5%2C chloroplastic;Parent=NBlab01G14500
NbV1Ch01        AUGUSTUS        mRNA    88291066        88292874        0.04    +       .       ID=NBlab01G14510.1;Note=nucleolin-like;Parent=NBlab01G14510
NbV1Ch01        AUGUSTUS        mRNA    88322551        88324084        0.1     -       .       ID=NBlab01G14520.1;Note=zinc finger BED domain-containing protein RICESLEEPER 2-like;Parent=NBlab01G14520

The resulting GFF3 file:

NBqld01 microsynteny    match   75913489        75936856        8       +       .       ID=blk-1905;Name=blk-1905_to_NbV1Ch01;Target=NbV1Ch01 90446162 90479531
NBqld01 microsynteny    match_part      75913489        75916784        438.3   +       .       ID=blk-1905.1.NBqld01G09150.1;Parent=blk-1905;Target=NBlab01G14990.1 90446162 90449165 +
NBqld01 microsynteny    match_part      75913489        75916116        438.3   +       .       ID=blk-1905.2.NBqld01G09150.2;Parent=blk-1905;Target=NBlab01G14990.3 90446162 90449165 +
NBqld01 microsynteny    match_part      75932007        75936883        213.4   -       .       ID=blk-1905.7.NBqld01G09180.1;Parent=blk-1905;Target=NBlab01G15020.1 90474498 90479531 -
NBqld01 microsynteny    match_part      75932157        75936856        528.9   -       .       ID=blk-1905.8.NBqld01G09180.2;Parent=blk-1905;Target=NBlab01G15020.1 90474498 90479531 -
NBqld01 microsynteny    match   75913489        75936856        8       +       .       ID=blk-1906;Name=blk-1906_to_NbV1Ch01;Target=NbV1Ch01 90446162 90479531000.1 90454994 90458939 -
NBqld01 microsynteny    match_part      75913489        75916784        438.3   +       .       ID=blk-1906.1.NBqld01G09150.1;Parent=blk-1906;Target=NBlab01G14990.3 90446162 90449165 +
NBqld01 microsynteny    match_part      75913489        75916116        438.3   +       .       ID=blk-1906.2.NBqld01G09150.2;Parent=blk-1906;Target=NBlab01G14990.1 90446162 90449165 +
NBqld01 microsynteny    match_part      75918271        75922448        215.3   -       .       ID=blk-1906.3.NBqld01G09160.1;Parent=blk-1906;Target=NBlab01G15000.1 90454994 90458939 -
NBqld01 microsynteny    match_part      75918532        75922421        380.6   -       .       ID=blk-1906.4.NBqld01G09160.3;Parent=blk-1906;Target=NBlab01G15000.1 90454994 90458939 -
NBqld01 microsynteny    match_part      75918532        75922448        215.3   -       .       ID=blk-1906.5.NBqld01G09160.5;Parent=blk-1906;Target=NBlab01G15000.1 90454994 90458939 -
NBqld01 microsynteny    match_part      75924269        75929750        185.3   +       .       ID=blk-1906.6.NBqld01G09170.1;Parent=blk-1906;Target=NBlab01G15010.1 90466931 90471854 +
NBqld01 microsynteny    match_part      75932007        75936883        213.4   -       .       ID=blk-1906.7.NBqld01G09180.1;Parent=blk-1906;Target=NBlab01G15020.1 90474498 90479531 -
NBqld01 microsynteny    match_part      75932157        75936856        528.9   -       .       ID=blk-1906.8.NBqld01G09180.2;Parent=blk-1906;Target=NBlab01G15020.1 90474498 90479531 -
NBqld01 microsynteny    match   75913489        75936856        8       +       .       ID=blk-1907;Name=blk-1907_to_NbV1Ch02;Target=NbV1Ch02 67391974 67445767310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75913489        75916784        419.5   +       .       ID=blk-1907.1.NBqld01G09150.1;Parent=blk-1907;Target=NBlab02G09300.1 67391974 67409155 +
NBqld01 microsynteny    match_part      75913489        75916116        419.5   +       .       ID=blk-1907.2.NBqld01G09150.2;Parent=blk-1907;Target=NBlab02G09300.2 67398533 67409155 +
NBqld01 microsynteny    match_part      75918271        75922448        216.5   -       .       ID=blk-1907.3.NBqld01G09160.1;Parent=blk-1907;Target=NBlab02G09310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75918532        75922421        371.3   -       .       ID=blk-1907.4.NBqld01G09160.3;Parent=blk-1907;Target=NBlab02G09310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75918532        75922448        216.5   -       .       ID=blk-1907.5.NBqld01G09160.5;Parent=blk-1907;Target=NBlab02G09310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75924269        75929750        305.1   +       .       ID=blk-1907.6.NBqld01G09170.1;Parent=blk-1907;Target=NBlab02G09320.1 67421569 67434518 +
NBqld01 microsynteny    match_part      75932007        75936883        218.8   -       .       ID=blk-1907.7.NBqld01G09180.1;Parent=blk-1907;Target=NBlab02G09330.2 67441539 67445767 -
NBqld01 microsynteny    match_part      75932157        75936856        507.7   -       .       ID=blk-1907.8.NBqld01G09180.2;Parent=blk-1907;Target=NBlab02G09330.1 67441539 67445767 -
NBqld01 microsynteny    match   75913489        75936856        8       +       .       ID=blk-1908;Name=blk-1908_to_NbV1Ch02;Target=NbV1Ch02 67398533 67445767310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75913489        75916784        419.5   +       .       ID=blk-1908.1.NBqld01G09150.1;Parent=blk-1908;Target=NBlab02G09300.2 67398533 67409155 +
NBqld01 microsynteny    match_part      75913489        75916116        419.5   +       .       ID=blk-1908.2.NBqld01G09150.2;Parent=blk-1908;Target=NBlab02G09300.1 67391974 67409155 +
NBqld01 microsynteny    match_part      75918271        75922448        216.5   -       .       ID=blk-1908.3.NBqld01G09160.1;Parent=blk-1908;Target=NBlab02G09310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75918532        75922421        371.3   -       .       ID=blk-1908.4.NBqld01G09160.3;Parent=blk-1908;Target=NBlab02G09310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75918532        75922448        216.5   -       .       ID=blk-1908.5.NBqld01G09160.5;Parent=blk-1908;Target=NBlab02G09310.1 67415016 67420166 -
NBqld01 microsynteny    match_part      75924269        75929750        305.1   +       .       ID=blk-1908.6.NBqld01G09170.1;Parent=blk-1908;Target=NBlab02G09320.1 67421569 67434518 +
NBqld01 microsynteny    match_part      75932007        75936883        218.8   -       .       ID=blk-1908.7.NBqld01G09180.1;Parent=blk-1908;Target=NBlab02G09330.2 67441539 67445767 -
NBqld01 microsynteny    match_part      75932157        75936856        507.7   -       .       ID=blk-1908.8.NBqld01G09180.2;Parent=blk-1908;Target=NBlab02G09330.1 67441539 67445767 -

Query browser: Screen Shot 2020-03-31 at 8 45 01 PM

Target browser: Screen Shot 2020-03-31 at 9 10 17 PM

  1. Is there a setting to change match_part to match in order to create easier to read names e.g. Name=NBlab01G14990.1-vs-NBqld01G09150.1?
  2. Did I use -D "|" correctly?

Thank you in advance,

Michal

wrf commented 4 years ago

Hello again,

Firstly, it looks like the matches worked for the most part, and you may not need the -D option.

The usage of match_part and match are standard terms that are hardcoded in the script. Depending on what you wanted, it might be easiest to change them with sed in your output file.

There is no Name= tag in the current version. I suppose I could make this, though I'm not sure I understand what you are trying to do here with the names. The subject seq name is including in the Target tag. If what you are trying to do is display the name/number of each gene and/or its target under the blue box, then you need to configure jbrowse/apollo to display the target for each feature, and not just the parent feature. Unfortunately I don't know how to do that myself.

Also, it looks like you are getting multiple overlapping blocks probably due to multiple isoforms at each locus. For clarity, you may want to either remove those, or pick a single "canonical" protein for each locus (like keep only t1 from augustus), and then run diamond again.

Hope this helps