lindsayrutter / bigPint

Bioconductor package that makes BIG data pint-sized.
https://lindsayrutter.github.io/bigPint/
20 stars 7 forks source link

Error in order(metricPair[threshVar]) : argument 1 is not a vector #6

Closed wegnerce closed 3 years ago

wegnerce commented 4 years ago

Hi Lindsay,

I came across your paper recently, which I really liked, and started to play around with bigPint using some own data following the recommended RNAseq pipeline.

I'm stuck at step 3, I can generate the parallel coordinate plot, but the clustering always fails with "Error in order(metricPair[threshVar]) : argument 1 is not a vector". I have the gut feeling that my data.metrics object is causing trouble but at first glance, it looks properly formatted.

Do you have any idea what is causing this issue? I pre-processed my data with edgeR.

Best Carl-Eric

lindsayrutter commented 4 years ago

Hello Carl-Eric:

Thank you for your interest in bigPint and I am happy to help figure out the root of this error. It sounds like you saw this error while running the recommended RNAseq pipeline at the following line:

ret <- plotPCP(data_st, dataMetrics, threshVal = 0.1, lineSize = 0.3,
  lineColor = "magenta", saveFile = FALSE, hover = TRUE)

I share your suspicion that your dataMetrics object may be the cause. I would recommend checking:

1) That the structure of your dataMetrics object is a list of one or more data frames. If you only have two treatment groups, then your dataMetrics object should be a list of one data frame element as seen here. If you have more than two treatment groups, then your dataMetrics object should be a list of data frame elements, where the number of data frame elements equals the number of pairwise combinations between treatment groups. An example of that is our data that had three treatment groups ("S1", "S2", "S3") which can be combined three ways ("S1_S2", "S1_S3", "S2_S3"), see here.

2) That the name of each list element in your dataMetrics object is of the format "GroupName1_GroupName2". For the example data with two treatment groups ("N" and "P"), there is one dataMetrics list element and it is called "N_P" (see here). For the example data with three treatment groups ("S1", "S2", and "S3"), there are three dataMetrics list elements and they are called "S1_S2", "S1_S3", and "S2_S3" (see here).

3) That each list element in your dataMetrics object has a variable that can be thresholded (threshVar) for significance (choosing differentially expressed genes). By default, bigPint uses the variable name "FDR" although this can be tailored. If you used edgeR, then you probably have an "FDR" column. You can see in the example dataMetrics objects that they contain one "FDR" for each list element.

If any of the above suggestions remain unclear, feel free to share your output when you run str() on both your data and dataMetrics objects. This may be the best way for me to help troubleshoot. Thank you!

Lindsay

wegnerce commented 4 years ago

Hi Lindsay,

thanks for coming back to me so fast. I had a look at the pages dedicated to the data and dataMetrics object, but I see no obvious problem. Below I paste the output from str() for both.

dataMetrics

List of 1
$ Py_PyLa:'data.frame': 4261 obs. of 7 variables:
..$ ID : chr [1:4261] "RHAL1_01396" "RHAL1_02998" "RHAL1_00217" "RHAL1_03439" ...
..$ Length: int [1:4261] 369 402 2748 1437 663 309 450 756 549 1590 ...
..$ logFC : num [1:4261] -3.32 -2.11 1.33 -3.38 1.49 ...
..$ logCPM: num [1:4261] 7.82 12.64 7.2 3.69 10.54 ...
..$ LR : num [1:4261] 339.9 185.1 112.2 104.8 82.8 ...
..$ PValue: num [1:4261] 6.57e-76 3.67e-42 3.30e-26 1.32e-24 8.94e-20 ...
..$ FDR : num [1:4261] 2.80e-72 7.81e-39 4.68e-23 1.41e-21 7.62e-17 ...

data

'data.frame':   4261 obs. of  7 variables:
$ ID : chr "RHAL1_00001" "RHAL1_00002" "RHAL1_00003" "RHAL1_00004" ...
$ PyLa.1: num 7.73 2.33 4.54 6.64 7.68 ...
$ PyLa.2: num 7.59 2.62 4.89 6.86 7.42 ...
$ PyLa.3: num 7.87 3.07 5.03 6.92 7 ...
$ Py.1 : num 7.4 2.3 4.45 6.63 7.8 ...
$ Py.2 : num 7.87 3.23 4.66 6.76 7.79 ...
$ Py.3 : num 7.58 2.66 5.14 6.49 7.44 ...

Thanks

lindsayrutter commented 4 years ago

Hello Carl-Eric:

I think I understand the problem. In your data object, the order of the variables in the columns is "PyLa" followed by "Py". However, in your dataMetrics object, the order of the variables in the list element name ("Py_PyLa") is the reverse direction - i.e. "Py" followed by "PyLa". The order of pairs of variables needs to be consistent between data and dataMetrics. So, I think there are two solutions:

1) Rename the dataMetrics list element: names(dataMetrics) <- "PyLa_Py"

or

2) Reorder the data columns: data <- data[c(1,5,6,7,2,3,4)]

I will either add a more informative error for this type of issue or (better yet) have bigPint reorganize variable orders if they are different between data and dataMetrics. Please let me know if the advice above does not solve the error. Thank you!

lindsayrutter commented 4 years ago

Hello Carl-Eric:

I am going through older issues right now and just wanted to touch base if you may have resolved the error you were having?

Thank you. Lindsay

no-response[bot] commented 3 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.