missuse / ragp

Filter plant hydroxyproline rich glycoproteins
MIT License
5 stars 4 forks source link

AAstringset inputs #5

Closed TS404 closed 2 years ago

TS404 commented 6 years ago

Ideally, the functions get_phobius_file, get_signalp_file, and get_targetp_file should be a bit more flexible on input (ie accepting AAstringset format).

missuse commented 6 years ago

AAstringset will be added as input format for all functions and the mentioned *_file functions will be changed to accept all objects as the other functions . This will be deployed some time next week.

missuse commented 6 years ago

Adding AAstringset as input to the functions has proven to require a design decision I am not sure how to make. To be able to process objects of this class, I would need to rely on BiocGenerics package. It should go in Suggests. Since it is not available on CRAN I would need to specify Remotes. When there are Remotes specified the package can not pass the CRAN check. Therefore if I add it now, when I decide to push to CRAN I will have to remove it (not a good option). Or add it and never push to CRAN (not a good option). There is a third option: to rely that every user that would provide objects of AAstringset to functions already has BiocGenerics (this is probably true in 99.99% cases).

Do you have any thoughts on the matter?

On a more bright side, functions get_signalp, get_targetp and get_phobius were added. They accept as input all objects like scan_ag.

Usage:

signalp_pred <- get_signalp(data = at_nsp[1:50,],
                            sequence,
                            Transcript.id)
TS404 commented 6 years ago

A proposed solution: It's relatively easy to convert from an AAstringset to seqinr list format. Perhaps each function could convert the sequence input to list of strings as its first step. That way it's fully compatible with BioConducor, without requiring BiocGenerics. Something like:

seq.AAstringset <- Biostrings::readAAStringset(file.fa)

seq.list <- lapply(lapply(seq.AAstringset,
                          as.character),
                   strsplit, "")
missuse commented 6 years ago

I am not sure how. The S4 generics such as as.vector and unlist that work on XStringSet-class are in BiocGenerics.

missuse commented 2 years ago

I decided its ok to never publish on CRAN a long time ago. I have no excuse for not implementing this till now. If you encounter any bugs please let me know.