ropensci / biomartr

Genomic Data Retrieval with R
https://docs.ropensci.org/biomartr
217 stars 29 forks source link

read_cds fails when returning in data.table format #57

Closed clementfkent closed 4 years ago

clementfkent commented 4 years ago

installed biomartr today. read_cds(file="testCDS.txt",format = "fasta", obj.type = "data.table") fails with message: "Error: File testCDS.txt could not be read properly. Please make sure that testCDS.txt contains only CDS sequences and is in fasta format."

testCDS.txt and a revised version of read_cds are attached. Revised version works well. NB: read_cds2.txt This file only contains the code for the data.table case - I didn't test the Biostrings case

testCDS.txt

HajkD commented 4 years ago

Hi Clement,

Many thanks for making me aware of this issue. To be honest I wanted to depreciate the data.table feature of read_cds(). Do you find it more useful than the Biostrings version for your application? If yes, then I will have a deeper look into it and will fix any issues that may exist in the current version.

Many thanks, Hajk

clementfkent commented 4 years ago

Thanks for the reply, Hajk. I just began using the package recently. My R code is very datatable and dataframe based, so I am most comfortable with that side of it.

Cheers, Clement

Clement Kent, Ph.D. Adjunct Professor, Dept. of Biology York University, Canada

On Mon, Jul 27, 2020 at 3:41 PM Hajk-Georg Drost notifications@github.com wrote:

Hi Clement,

Many thanks for making me aware of this issue. To be honest I wanted to depreciate the data.table feature of read_cds(). Do you find it more useful than the Biostrings version for your application? If yes, then I will have a deeper look into it and will fix any issues that may exist in the current version.

Many thanks, Hajk

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/biomartr/issues/57#issuecomment-664599006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4YMK6P6S6G5WTRHIQ4E4TR5XKAJANCNFSM4PICWOUQ .

HajkD commented 4 years ago

Dear Clement,

Excellent. In that case I will have a look into it and will make sure that this feature functions smoothly.

Many thanks for your feedback.

Cheers, Hajk

HajkD commented 4 years ago

Dear Clement,

I found the issue and fixed it. In addition, I added a more extensive check for CDS sequences that have lengths that do not divide by 3 (which sadly exists quite frequently in NCBI/ENSEMBL datasets).

Many thanks for pointing this issue out to me and I hope that the function works for you now.

biomartr::read_cds(file = "testCDS.txt", obj.type = "data.table", delete_corrupt = TRUE)

Best wishes, Hajk