hoffmangroup / genomedata

The Genomedata format for storing large-scale functional genomics data.
https://genomedata.hoffmanlab.org/
GNU General Public License v2.0
2 stars 1 forks source link

Chromsome name mapping from assembly reports #56

Closed EricR86 closed 3 years ago

EricR86 commented 3 years ago

This PR adds support for chromosome name mapping during the load-seq step in Genomedata creation. These options are available both in genomedata-load-seq and genomedata-load.

A CSV file (or more likely an assembly report from NCBI) is given where there's an assumed # marker for comments and fields are separated by tabs. It assumes that the header fields are in a commented line above the non-commented fields.

Tests are added with an assembly report from NCBI, and two AGP files one with a RefSeq accession ID, and another with a GenBank accession ID. They are both mapped to UCSC-style naming schemes.

EricR86 commented 3 years ago

@michaelmhoffman this is ready for your review