morrislab / qapa

RNA-seq Quantification of Alternative Polyadenylation
GNU General Public License v3.0
42 stars 10 forks source link

ValueError: invalid literal for int() with base 10: '+' #28

Closed Sylarair closed 4 years ago

Sylarair commented 4 years ago

Hi,

I am facing an issue when running "qapa build --db mm10_ensembl_identifiers.txt -g vM22.gencode.polyA_sites.bed -p atlas.clusters.mm10.2-0.bed vM22.gencode.comprehensive.no_version.txt > mm10.vM22.output_utrs.bed" with qapa-1.3.0.

The error is:

Screen Shot 2019-10-09 at 6 10 09 PM

Also, the files used in the command are available in the links. Can anyone help me with this? Many thanks! https://www.dropbox.com/s/496qky0otm4yq4v/atlas.clusters.mm10.2-0.bed?dl=0, https://www.dropbox.com/s/vdad6chlsocv6by/mm10_ensembl_identifiers.txt?dl=0, https://www.dropbox.com/s/9pcp8432g77f5g9/vM22.gencode.comprehensive.no_version.txt?dl=0, https://www.dropbox.com/s/rdz99cblq0t0w5f/vM22.gencode.polyA_sites.bed?dl=0

kcha commented 4 years ago

The error is due to the vM22.gencode.comprehensive.no_version.txt file missing the header row that provided by UCSC when downloading from the Table Browser.

Also, I'm surprised you were able to install v1.3.0 on Python 2, which is no longer supported in this version.

Sylarair commented 4 years ago

Thanks for your answers. However, when I use the 'vM22.gencode.basic.txt' to replace 'vM22.gencode.comprehensive.no_version.txt', and change running environment to python 3.6.7, I still get the similar error, which seems to get the 4th column instead of the 5th (self.txStart = int(l[4])).

Screen Shot 2019-10-11 at 9 52 46 PM

The 'vM22.gencode.basic.txt' is here. https://www.dropbox.com/s/5gkj5r7h5x7qxih/vM22.gencode.basic.txt?dl=0

kcha commented 4 years ago

Your header is missing the # character at the beginning that appears from UCSC Table Browser (e.g. #bin). QAPA looks for this to determine whether the input is from UCSC or a custom file. In the future, I will have to look into making it more flexible when recognizing format. For now, please use the header unmodified.

Sylarair commented 4 years ago

Thanks for your reply. I tried what you suggested, and it worked.

Also, I got the vM22.gencode.basic.txt with mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select * from wgEncodeGencodeBasicVM22" mm10 > gencode.basic.txt, while there is no "#" in the file header.

kcha commented 4 years ago

Looks like a bug in the new version as this should have been working before. Thank you for catching this!