rformassspectrometry / QFeatures

Quantitative features for mass spectrometry data
https://RforMassSpectrometry.github.io/QFeatures/
25 stars 7 forks source link

readQFeatures no longer supports reading from a .txt file #213

Open lmsimp opened 7 months ago

lmsimp commented 7 months ago

Hi Laurent,

I have been trying to use the readQFeatures function in the latest release to read in data from an external file e.g. from third party .txt file. To date, this is what we have been doing in our workflows when using QFeatures but now I see this is no longer supported and only now is an option with readSummarizedExperiment?

Would you consider adding the option back to readQFeatures for users so we can create a QFeatures object directly from an external file as per readSummarizedExperiment?

Best,

Lisa


An example,

This works perfectly as the data is already a data.frame

## Get an example PSM file from QFeatures
data("hlpsms")

## Create QF object
qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms")

Create an example .csv and write it locally a test data

## Now write this data to a .csv as example data to read 
write.csv(hlpsms, file = "hlpsms.csv")

## Check the structure of the .csv
csv <- read.csv(file = "hlpsms.csv")
csv[1:3, 1:3]

# > csv[1:3, 1:3]
# X126      X127C      X127N
# 1 0.12283431 0.08045915 0.07080406
# 2 0.35268185 0.14162381 0.16752388
# 3 0.01546089 0.16142297 0.08693813

If I now try and read from a .csv file I get the following errors

## specify file name
f <- "hlpsms.csv"

grep("X1", names(read.csv(f, sep = ",")))
# [1]  2  3  4  5  6  7  8  9 10 11

## Looks good, quant data is now in 2:11, try read this data
qf2 <- readQFeatures(f, quantCols = 2:11)

# Checking arguments.
# Error in .checkQuantCols(assayData, colData, quantCols) : 
#   Some column names in 'quantCols' are not found in 'assayData': NA, NA, NA, NA, NA, NA, NA, NA, NA, NA.

Also, the same if a character is specified for the quantCols

(id_character <- grep("X1", names(read.csv(f, sep = ",")), value = TRUE))
# [1] "X126"  "X127C" "X127N" "X128C" "X128N" "X129C" "X129N" "X130C" "X130N" "X131" 
qf2 <- readQFeatures(f, quantCols = id_character)

# Checking arguments.
# Error in .checkQuantCols(assayData, colData, quantCols) : 
#   Some column names in 'quantCols' are not found in 'assayData': X126, X127C, X127N, X128C, X128N, X129C, X129N, X130C, X130N, X131.

Works perfectly for SEs

se <- readSummarizedExperiment(f, quantCols = 2:11)
> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] MSnbase_2.30.1              ProtGenerics_1.36.0         mzR_2.38.0                 
 [4] Rcpp_1.0.12                 QFeatures_1.14.0            MultiAssayExperiment_1.30.0
 [7] SummarizedExperiment_1.34.0 Biobase_2.64.0              GenomicRanges_1.56.0       
[10] GenomeInfoDb_1.40.0         IRanges_2.38.0              S4Vectors_0.42.0           
[13] BiocGenerics_0.50.0         MatrixGenerics_1.16.0       matrixStats_1.3.0   
lgatto commented 7 months ago

What about

> qf <- read.csv("hlpsms.csv") |> readQFeatures(quantCols = 2:11)
Checking arguments.
Loading data as a 'SummarizedExperiment' object.
Formatting sample annotations (colData).
Formatting data as a 'QFeatures' object.
Charl-Hutchings commented 7 months ago

Agree with Lisa. The readQFeatures function was widely used in our lab, including in published workflows and courses that we designed using QFeatures. I think that one of the great things about QF is the simplicity, so this additional step seems to me unnecessary. It'd be great if the functionality to read directly from files could be added back.

lgatto commented 7 months ago

I will look into it, but let me highlight some other aspects of your request:

You say that

I think that one of the great things about QF is the simplicity, so this additional step seems to me unnecessary.

With regard to readQFeatures(), the example showed above is trivial, which is something you seem to appreciate. But we have files that are read and parsed into hundreds of assays. This is a situation that needed to be (and has been) simplified, in particular with regard to sample meta-data incorporation. In such non-trivial situations, we never read the data directly from a file; in the most simple cases, using readSummarizedExperiment() also just works.

You would like the old behaviour to be added. But please, keep in mind that this seemingly simple request adds some non-trivial work on my busy plate. I don't know how easy it will be to add what you want, but I will probably have to:

From what I gathered from the original request, the only thing that you need to change on your side is the suggestion I proposed:

readQFeatures(file, quantCols = e)

to

read.csv(file) |> readQFeatures(quantCols = e)

This applies to your teaching material [*], the F1000research paper (which you can update to a new version) and possibly more. But there is no way for me to guarantee backwards compatibility for code/material I don't maintain/know about. But in general, I do make efforts to maintain backwards compatibility.

And last but not least, which despite this issue might be good news, we are working on a dedicated import app, that aims at making data import (especially the more difficult cases) as easy as possible.

[*] You should anyway update your material and start using quantCols instead of ecols, that is likely going to be deprecated in a release or two.

Charl-Hutchings commented 6 months ago

Hi Laurent,

Completely understand how much work it is to develop/maintain all of your packages and appreciate that no task is trivial. We were simply disappointed to see what we had considered very useful functionality removed.

lgatto commented 6 months ago

Don't forget that you can always

> readQFeaturesCCP <- function(filename, sep = ",", quantCols, ...) 
    read.csv(filename, sep = sep) |> 
    readQFeatures(quantCols = quantCols, ...)
> readQFeaturesCCP("hlpsms.csv", quantCols = 2:11)
Checking arguments.
Loading data as a 'SummarizedExperiment' object.
Formatting sample annotations (colData).
Formatting data as a 'QFeatures' object.
An instance of class QFeatures containing 1 assays:
 [1] quants: SummarizedExperiment with 3010 rows and 10 columns 
cvanderaa commented 6 months ago

(pinging @StijnVandenbulcke who raised the same issue to me.)

Hi @lmsimp and @Charl-Hutchings,

I'm sorry to hear that our recent changes negatively affected your teaching and research material.

I however agree with the points mentioned by Laurent. The refactoring of readQFeatures() resulted from quite some discussions, and we deliberately decided to remove functionality to read tables from files. In my experience and usage of readQFeatures(), I always use read.table() (or functions alike) because i. I always double check the table was correctly imported, ii. I always forget what quantCols should be; iii. I usually use grep() on the imported column names or look at the column indexing after printing the colnames to the console when defining quantCols, hence requiring the data to be already imported. I can argue this is the best practice, and in fact, this is also what you demonstrated in your recent workflow paper.

Of course, this is my opinion based on a small sample (a few lab members/collaborators and me). I would be open to spend some time reverting to importing data from file if you could provide us with a convincing use case.