mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Error for differential expression analysis #67

Closed BenjaminSchwessinger closed 4 years ago

BenjaminSchwessinger commented 4 years ago

Hi there, Thanks for making TEtranscript. I am running 2.14.

I get the following error and I am unsure what file is actually wrong.

INFO @ Tue, 21 Apr 2020 01:35:37: Finished processing sample files INFO @ Tue, 21 Apr 2020 01:35:37: Generating counts table INFO @ Tue, 21 Apr 2020 01:35:38: Performing differential analysis ...

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 37218 did not have 7 elements Calls: read.table -> scan Execution halted

The cntTable file looks good as inspected via command line and in pandas.

olivertam commented 4 years ago

Hi,

Would you be able to provide the XXXX.cntTable file? Based on the error message, I'm suspecting that there might be an odd character on line 37218 (extra tab or a # symbol, or an extra quote '?) that might be throwing off the read.table R command.

Thanks.

BenjaminSchwessinger commented 4 years ago

Thanks for the quick reply. I changed the ending to txt. haustoria_infectedWheat7.txt

olivertam commented 4 years ago

Hi,

It looks like the error is actually on line 37219 of your file (line 37218 excluding the header line):

BNINTMO_#2_ClassI_LTR_Gypsy:LTR_Gypsy:ClassI_LTR    37  53  29  29  43  63

In fact, I noticed many more lines where the # symbol exists. This is a reserved character in R (and most programming languages) to indicate that the rest of the line is a comment. Thus, everything after it is "lost", and why R says that there are insufficient columns of information. The quickest way to fix this is to substitute all # with something else (or remove it completely).

Thanks.

BenjaminSchwessinger commented 4 years ago

Great. Thanks. Didn't know this default behaviour in R as pandas handled it just fine.

I will fix this in the TE gtf file with sed to * and report back here if it all worked.

Your help is much appreciated.


From: Oliver Tam notifications@github.com Sent: Tuesday, April 21, 2020 8:15 AM To: mhammell-laboratory/TEtranscripts TEtranscripts@noreply.github.com Cc: Benjamin Schwessinger benjamin.schwessinger@anu.edu.au; Author author@noreply.github.com Subject: Re: [mhammell-laboratory/TEtranscripts] Error for differential expression analysis (#67)

Hi,

It looks like the error is actually on line 37219 of your file (line 37218 excluding the header line):

BNINTMO_#2_ClassI_LTR_Gypsy:LTR_Gypsy:ClassI_LTR 37 53 29 29 43 63

In fact, I noticed many more lines where the # symbol exists. This is a reserved character in R (and most programming languages) to indicate that the rest of the line is a comment. Thus, everything after it is "lost", and why R says that there are insufficient columns of information. The quickest way to fix this is to substitute all # with something else (or remove it completely).

Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mhammell-laboratory/TEtranscripts/issues/67#issuecomment-616840063, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRMZBRYPA4F6KIHOTEPI6TRNTCO5ANCNFSM4MMZQDNA.

BenjaminSchwessinger commented 4 years ago

All fixed. Closing the issue.