statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

GFF generated from mummer: aggregate scaffolds into a single parent feature #312

Open bradfordcondon opened 6 years ago

bradfordcondon commented 6 years ago

Feature description

The walnut GFF tracks loaded in consist of lots of little, disconnected features. #227
I need to create a parent track, per scaffold, so that it will display cleaner and be more clear.

screen shot 2018-06-22 at 11 43 18 am

Link to relevant info

https://github.com/bradfordcondon/simple_biopython/blob/master/staton_biopy/coords_to_gff.py

bradfordcondon commented 6 years ago

gff parser improved (see https://github.com/bradfordcondon/simple_biopython/issues/1).

Dev server currently being copied over so cant test if we've achieved the desired result until tomorrow.

bradfordcondon commented 6 years ago

hmmm terms with no ID have a parent that wasnt created.

sudo ../jbrowse/bin/flatfile-to-json.pl --gff stevens_gff_v2/wingnut_chandler_v2.gff --tracklabel p_stenoptera_new_test

GFF3 parse error: some features reference other features that do not exist in the file (or in the same '###' scope).  A list of them:
 ID                 |           Cannot Find
----------------------------------------------------------------------
(no id)             | Parent=C840987
(no id)             | Parent=C840987
 at /home/data/apollo/jbrowse/bin/../src/perl5/Bio/JBrowse/ConfigurationManager.pm line 7.
    Bio::JBrowse::ConfigurationManager::__ANON__('\x{a}GFF3 parse error: some features reference other features tha...') called at /home/data/apollo/jbrowse/bin/../src/perl5/../../extlib/lib/perl5/Bio/GFF3/LowLevel/Parser.pm line 195
    Bio::GFF3::LowLevel::Parser::_buffer_all_under_construction_features('Bio::GFF3::LowLevel::Parser=HASH(0x2096eb8)') called at /home/data/apollo/jbrowse/bin/../src/perl5/../../extlib/lib/perl5/Bio/GFF3/LowLevel/Parser.pm line 168
    Bio::GFF3::LowLevel::Parser::_buffer_items('Bio::GFF3::LowLevel::Parser=HASH(0x2096eb8)') called at /home/data/apollo/jbrowse/bin/../src/perl5/../../extlib/lib/perl5/Bio/GFF3/LowLevel/Parser.pm line 73
    Bio::GFF3::LowLevel::Parser::next_item('Bio::GFF3::LowLevel::Parser=HASH(0x2096eb8)') called at /home/data/apollo/jbrowse/bin/../src/perl5/Bio/JBrowse/FeatureStream/GFF3_LowLevel.pm line 16
    Bio::JBrowse::FeatureStream::GFF3_LowLevel::next_items('Bio::JBrowse::FeatureStream::GFF3_LowLevel=HASH(0x1fa0800)') called at /home/data/apollo/jbrowse/bin/../src/perl5/Bio/JBrowse/Cmd/NCFormatter.pm line 52
    Bio::JBrowse::Cmd::NCFormatter::_format('Bio::JBrowse::Cmd::FlatFileToJson=HASH(0x17c6150)', 'trackConfig', 'HASH(0x19b0c30)', 'featureStream', 'Bio::JBrowse::FeatureStream::GFF3_LowLevel=HASH(0x1fa0800)', 'featureFilter', 'CODE(0x1f99778)', 'trackLabel', 'p_stenoptera_new_test', ...) called at /home/data/apollo/jbrowse/bin/../src/perl5/Bio/JBrowse/Cmd/FlatFileToJson.pm line 128
    Bio::JBrowse::Cmd::FlatFileToJson::run('Bio::JBrowse::Cmd::FlatFileToJson=HASH(0x17c6150)') called at ../jbrowse/bin/flatfile-to-json.pl line 9

I can confirm that for some reason, this scaffold wasnt written to the GFF as a parent.

bradfordcondon commented 6 years ago

fixed a problem in hte script that printe d the previous parent instead of the current parent. I'll wait to resubmit until the sync finishes though.

bradfordcondon commented 6 years ago

I also fixed a problem where the parent feature start/end coordinates wasnt set properly.

I've uploaded to dev and added a track to test. Need to fix permissions #314 before I can verify.

almasaeed2010 commented 6 years ago

314 is now fixed.

bradfordcondon commented 6 years ago

https://hardwoods.ag.utk.edu/tools/jbrowse/index.html?data=english_walnut&loc=chloroplast%3A81637..83544&tracks=p_stenoptera_new_test_v%2Cgene&highlight=

screen shot 2018-06-27 at 3 28 31 pm

Steps forward

bradfordcondon commented 6 years ago

the above image is looking at the most problematic scaffold: chloroplast.

screen shot 2018-06-27 at 3 37 22 pm

or

https://hardwoods.ag.utk.edu/tools/jbrowse/index.html?data=english_walnut&loc=jcf7180001221206%3A1..358591&tracks=p_stenoptera_new_test_v%2Cgene&highlight=

screen shot 2018-06-27 at 3 39 22 pm

I still think the problem of a hit at the start/end of the scaffold will cause the alignment to span the whole thing, not showing a connected region with black line in the middle.

Also, filtering by size of the total parent could very nicely remove all the repeat stuff.

mestato commented 6 years ago

@mestato ask Jiali to rerun nucmer with these genomes and see if we can get a more sensible gff

bradfordcondon commented 6 years ago