Hi!
I have read content in the issue.
Three questions puzzled me about using the perl script "makeTEgtf.pl".
1) [INFILE] which format of te annotation file is needed? gtf or saf ? The gtf file downloaded from UCSC rmsk track seemly don't include the TE name/family/class. Here is the first row of my gtf file:
2) should I input the column name(genoName, genoStart, genoEnd...) or column index (1,2,3...)?
3) Is the swScore in saf file the score you mentioned in perl scripts?
Just right row, I have tried to choose saf file as input file and use numeric number to input the columns. The result is shown below, could you help me to check whether it is right?
the first two rows:
The file that I've been using is the sql table output from UCSC, but the SAF file would work if it contains the right columns. I have to confess that it doesn't really support GTF as an input (since it doesn't put the TE name in a separate column), but other column-based files (e.g. BED or tab-delimited annotaions) should work.
Please use the column index (1, 2, 3 etc). Sorry if this wasn't clear.
I use the swScore column, but that's optional (it is not utilized by TEtranscripts itself)
Your output file looks reasonable.
Your assumptions about the "_dup" is correct. The first instance would not have the "_dup" in the transcript_id
The best way to test if the file is correct would be to try it with TEtranscripts. If it's parsed correctly, then you should be able to run it without too many issues. We are thinking about a "GTF checker", and might integrate it with another tool that we're developing.
Hi! I have read content in the issue. Three questions puzzled me about using the perl script "makeTEgtf.pl".
1) [INFILE] which format of te annotation file is needed? gtf or saf ? The gtf file downloaded from UCSC rmsk track seemly don't include the TE name/family/class. Here is the first row of my gtf file:
But the saf file includes these information
2) should I input the column name(genoName, genoStart, genoEnd...) or column index (1,2,3...)?
3) Is the swScore in saf file the score you mentioned in perl scripts?
Just right row, I have tried to choose saf file as input file and use numeric number to input the columns. The result is shown below, could you help me to check whether it is right? the first two rows:
the AluY with multiple locations (the transcript_id of the first location of AluY is named as AluY without "_dup", is that right ?)
Best wishes! Hanwen Yu 30th March, 2020