ryanlayer / giggle

Interval data structure
MIT License
224 stars 29 forks source link

Feature request - bed file header read/write/bulk retrieve #42

Open pfsulliv opened 5 years ago

pfsulliv commented 5 years ago

Added here at Ryan Layer's request.

I put a post on bedtools-discuss (https://groups.google.com/forum/#!topic/bedtools-discuss/t6E74mCQb-E), suggesting that you support adding and retrieving headers in bed files. I am not suggesting the one adds something as extensive as in VCF but a handful of clearly defined “##” fields would be exceptionally useful (genome reference, organism, description, source - would seem to be key).

pfsulliv commented 5 years ago

bedtools-discuss post:

Feature request, capacity for bedtools to add and extract machine-readable metadata. Have searched this group for "metadata" and "comment" (only found a 2015 feature request, https://groups.google.com/d/msg/bedtools-discuss/TetrJYsJHX4/af5fEo1UAAAJ ). Forgive me if I've missed something.

My concern is error and reproducibility. We now include a lot of information in the bloody file name but in highly variable ways; I doubt anyone believes this is optimal. VCF is in a sense the opposite as usually have exhaustive headers that explicitly define pretty much everything in the file (some headers are 100s of lines). I am not arguing for full VCF approach,

However, what about defining and adding support in bedtools to add/read a reasonable but minimal set of header lines for bed files? Ideally, there would be a way to extract these from many files and to make them into a table (that could be put into a supplement). Comments appear to be possible in bed files (per UCSC), lines beginning with "#".

fileDate="20190710"

preparedBy="JohnSmith(UnivOfSomewhere)"

genomeReference="GRCh37"

organism="H.sapiens"

description="H3K4me3, MACS narrow peaks from human cortex (N=9)"

source=

codeString="curl something | bedtools merge -i - | gzip -c > myfile.bed.gz"

The above are just some ideas cobbled together from various sources.

All this open to mods of any sort.