pkerpedjiev / negspy

Python NGS Repository
MIT License
9 stars 14 forks source link

could you add galGal6? #4

Open gibcus opened 5 years ago

gibcus commented 5 years ago

http://hgdownload.cse.ucsc.edu/goldenpath/galGal6/bigZips/galGal6.chrom.sizes

pkerpedjiev commented 5 years ago

If you want, you can just copy that file to negspy/negspy/data/galGal6/chromInfo.txt and create a PR.

Or I can add it this weekend :-)

If you choose to add it and create a PR, just make sure that the chromosome order matches negspy/negspy/data/galGal5/chromInfo.txt.

gibcus commented 5 years ago

Sweet, I copied it!

sergpolly commented 5 years ago

@pkerpedjiev we did add the data/galGal6/chromInfo.txt to the repo - see PR we have a couple of questions about order/contents though:

gibcus commented 5 years ago

Now I have some questions regarding a new genome as an assembly. I wanted to use clodius to aggregate a bedpe with galGal6 coordinates; i.e.:

clodius aggregate bedpe \
--assembly galGal6 \
--chr1-col 1 \
--from1-col 2 \
--to1-col 3 \
--chr2-col 1 \
--from2-col 2 \
--to2-col 3 \ 
-o /dir/file.insulation.bed.multires \
--has-header /dir/file.insulation_masked.bed \
--chromsizes-filename /dir/galGal6.reduced.chrom.size

I made a "reduced" chrom.size file to exclude all the contigs and keep only whole chromosomes Would having a galGal6 ChromInfo be sufficient to make HiGlass aware of a galGal6 assembly? And if so, would updating any HiGlass components on our server be required to get the ChromInfo.txt from Negspy?

pkerpedjiev commented 5 years ago

Would having a galGal6 ChromInfo be sufficient to make HiGlass aware of a galGal6 assembly?

This is a bit of a touchy topic :-)

The short answer is yes. The long answer is that it depends on what you mean by "make HiGlass aware of a galGal6 assembly".

HiGlass doesn't technically have a notion of an assembly. It only displays data where it's told to display it. When you aggregate a bedfile with using chromsizes-filename, it uses the lengths of the chromosomes to determine the offsets of the bedfile entries from the 0 position. So if you do that aggregation and load the resulting the beddb file in HiGlass, you'll see the bedfile entries displayed as if the chromosomes in the chromsizes file were laid end to end. Hence, short answer, yes :-)

Now, if you want to see which chromosomes correspond to which positions along the x-axis or to have the search bar display "assembly" coordinates, you'll need to register the chromsizes file using:

higlass-manage ingest --filetype chromsizes-tsv --datatype chromsizes --assembly galGal6 negspy/data/galGal6/chromInfo.txt

If you would like to be able to search for gene annotations in that assembly, you'll need to create a gene annotations track: https://docs.higlass.io/data_preparation.html#gene-annotation-tracks

Does help or at least make sense?

gibcus commented 5 years ago

This helps a lot and makes sense! I'll try the ingest.

On the gene annotations: It was very straightforward to use your bash commands to generate the geneAnnotations.bed. However, the instructions mention exonU.py to create geneAnnotationsExonUnions.bed and I could not locate the python script.

pkerpedjiev commented 5 years ago

Oh, sorry about that. It's in the clodius repo here:

https://github.com/higlass/clodius/blob/develop/scripts/exonU.py

Docs sure need a bit of work 🤔

gibcus commented 5 years ago

Great! Thanks a lot Peter.