wyang17 / SQuIRE

Software for Quantifying Interspersed Repeat Expression
Other
49 stars 29 forks source link

Non-model species #53

Open RRebo opened 3 years ago

RRebo commented 3 years ago

Hi,

I'm quite excited to use SQuIRE, but I work on a non-UCSC genome, with no files to "fetch". Would it be possible to make available the formats of the "fetch" files, but also potentially an option to upload our own well-formatted files? Otherwise, is it possible to maybe tweak the code to add my own files (#notabioinformaticienhere)?

Thank you very much Rita

CeciliaDeng commented 3 years ago

Hi @wyang17 , I have a similar request. We've assembled non-model species genome (in fasta format), de novo TE annotation and gene predictions (both in gff3 files). I suppose we can skip the "fetch" step and build star index outside SQuIRE? Are there any other files we need to provide so that we can run the SQuIRE/"clean" step? Thank you.

virginie-me commented 3 years ago

Hi, did you solve your problem with non-model specie, in particular for TE annotation file provided in gff3 format ?

Thank you Virginie

RRebo commented 3 years ago

Hi, no I didn't, yet. But I'm trying right now with a student to figure this out. The idea is to add to the Fetch folder the same files as Fetch would get from UCSC. The repeatmasker file can be fed at the Clean step. So for now, if I understood correctly, you need your genome fasta, a chrom_info file with the chrom sizes, and a gene annotation file. Then we would modify the Fetch script to get these files from the Fetch folder and proceed with the usual commands. The only thing that is a bit complicated now, is to get the refGene file formated as UCSC, but we are working on (since this morning :) ), and I'll keep you informed if we suceed. I'd like to have something that other people might use, so we are not all going through these same problems!

RRebo commented 3 years ago

Update: squire maps work with our midified script. We'll try squire count and keep you informed. R

RRebo commented 3 years ago

Ok, seems to work... Here is what we have done, but seriously, it is nothing much. The main problem is creating the files that you need to run squire... We'll try to make it easier, and I'll update this thing, but at least it gives you an idea of what you would need: https://hackmd.io/@unleash/squireNonModel R

bteefy commented 2 years ago

Hi @RRebo . Thank you for putting together your hack for non-model species. It is very helpful. I was able to generate the 2 bit genome file using faToTwoBit from UCSC utilities. However, I am having some issues converting our gene and TE annotation to the RefSeq format. Do you have any tips regarding how to make this conversion?

Best, Bryan

hanshanmengqi commented 1 year ago

RefSeq

Dear Bryan,

@bteefy Do you solve the issue what you met? I also struggling with the RefSeq format.

Best, Hanshan