Error during step5 with 10x scRNA data

i19870503 commented 3 years ago

I run the test data without any probelm, however, the error poped when I analyzed our data at step 5. I check the result at step 4, and found that IntegratedNet_edgeCost_common.txt only contained NA, RootNode.txt and IntegratedNet_edge.txt appeared to be normal. Beside, I loaded the Rdata results which may used for step 5, e.g. integratedNet_EdgeCost_common and integratedNet_GenePrize_initial in IntegratedNet_TypATypB_ID.RData were NaN. Please help me to fix the problem.

I noticed that the test data was ln-transformed count, but I used the raw count data from 10x genomics, I don't know it is the key point for the error. I post the error message blew:

[1] "(4/7) Constructing the integrated gene network--Computation of NonSelfTalk score of TypeA"

Attaching package: ‘entropy’

The following objects are masked from ‘package:infotheo’:

discretize, entropy

There were 50 or more warnings (use warnings() to see the first 50)
[1] "(4/7) Constructing the integrated gene network--Computation of NonSelfTalk score of TypeB"
There were 50 or more warnings (use warnings() to see the first 50)
[1] "(4/7) Constructing the integrated gene network...(around 40 min and need 10G space)"
[1] "------Calculating relevance coefficients for genes in cell type A..."
[1] "------Calculating relevance coefficients for genes in cell type B..."
[1] "------Relevance coefficient calculation done!"
[1] "2021-08-28 14:17:16 CST"
[1] "(5/7) Generating background PCSFs...(around 20 min)"
Traceback (most recent call last):
  File "gen_PCSF.py", line 11, in <module>
    Cost = numpy.loadtxt("IntegratedNet_edgeCost.txt", dtype = 'float')
  File "/home/zhongl/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1159, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "/home/zhongl/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1087, in read_data
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "/home/zhongl/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1087, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "/home/zhongl/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 794, in floatconv
    return float(x)
ValueError: could not convert string to float: 'NA'
Traceback (most recent call last):

huBioinfo commented 3 years ago

Dear User,

I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.

xcliu-oc commented 3 years ago

I got similar errors.

Traceback (most recent call last): File "gen_PCSF.py", line 11, in Cost = numpy.loadtxt("IntegratedNet_edgeCost.txt", dtype = 'float') File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1148, in loadtxt for x in read_data(_loadtxt_chunksize): File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in read_data items = [conv(val) for (conv, val) in zip(converters, vals)] File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in items = [conv(val) for (conv, val) in zip(converters, vals)] File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 736, in floatconv return float(x) ValueError: could not convert string to float: 'NA' [1] "2021-08-30 13:14:19 UTC" [1] "(6/7) Generating the final signaling network between the two cell types...(around 25 min)" Error in { : task 1 failed - "missing value where TRUE/FALSE needed" Calls: genSignalingNetwork ... genSummaryPCSF -> runAnalysisFile -> %dopar% -> Execution halted

checked my "IntegratedNet_edgeCost_common.txt" and it's all NAs. but "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" are good. looks like something wrong when generating edge cost? please advise.

huBioinfo commented 3 years ago

I got similar errors.

Traceback (most recent call last): File "gen_PCSF.py", line 11, in Cost = numpy.loadtxt("IntegratedNet_edgeCost.txt", dtype = 'float') File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1148, in loadtxt for x in read_data(_loadtxt_chunksize): File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in read_data items = [conv(val) for (conv, val) in zip(converters, vals)] File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in items = [conv(val) for (conv, val) in zip(converters, vals)] File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 736, in floatconv return float(x) ValueError: could not convert string to float: 'NA' [1] "2021-08-30 13:14:19 UTC" [1] "(6/7) Generating the final signaling network between the two cell types...(around 25 min)" Error in { : task 1 failed - "missing value where TRUE/FALSE needed" Calls: genSignalingNetwork ... genSummaryPCSF -> runAnalysisFile -> %dopar% -> Execution halted

checked my "IntegratedNet_edgeCost_common.txt" and it's all NAs. but "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" are good. looks like something wrong when generating edge cost? please advise.

Hi, it seems a major problem related with the data. Could you please share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") under the /Output/ folder to me via huyuxuan@xidian.edu.cn or some other cloud storage? I'll carefully look into this "NA" problem. Thanks for your report.

i19870503 commented 3 years ago

Dear User,

I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.

I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by compCrosstalk_specific function in construct_integratedNetwork.R

huBioinfo commented 3 years ago

Dear User, I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.

I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by compCrosstalk_specific function.

Hi, thanks for your information. "typeSpecific" contains NaN, Inf and real numbers, which are normal. Could you help check "IntracellularNetwork_TypeA/B.txt"? If values in this file still are not NAs, can you share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") to me via huyuxuan@xidian.edu.cn or some other cloud storage? Thank you so much for your contribution. I really want to find out what caused the NA problem.

i19870503 commented 3 years ago

Dear User, I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.

I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by compCrosstalk_specific function.

Hi, thanks for your information. "typeSpecific" contains NaN, Inf and real numbers, which are normal. Could you help check "IntracellularNetwork_TypeA/B.txt"? If values in this file still are not NAs, can you share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") to me via huyuxuan@xidian.edu.cn or some other cloud storage? Thank you so much for your contribution. I really want to find out what caused the NA problem.

Thanks for your advise, IntracellularNetwork_TypeA/B.txt do not contain NA. Via the clue of typeSpecific, I found the function in compPEM might be the source of the problems, and there are some questions when I debug this function:

I loaded Exp_allCSV_NoLog.RData file, but allExpVector_NoLog contains more the 2 objects, e.g. my input folder has 5 file .csv file of RNA-seq data, which listed in allExpFile:
```
allExpFile
[1] "scRNAseq_Endo.csv"    "scRNAseq_Endo2.csv"   "scRNAseq_Germ.csv"   
[4] "scRNAseq_Germ2.csv"   "scRNAseq_Sertoli.csv"
```
which include ln-transformed and oringal raw counts data for typeA/B, but allExpVector_NoLog also contains 5 dataframe of each sample. I think that should be optimized for avoiding meaningless loading or computing in previous step.
The key point I found the may be here in compPEM, allExpVector_NoLog contains 3 Inf, which lead datasetSum to be Inf and make subsequential errors, but the ln-transformed data seem to correct, next I will remove other data and re-run with the folder only include ln-transformed data

for(i in 1:5){
         print(paste("sum(Exp_tpmMean[[i]]):", sum(Exp_tpmMean[[i]]), sep = ''))
 }
[1] "sum(Exp_tpmMean[[i]]):Inf"
[1] "sum(Exp_tpmMean[[i]]):9005.20052446813"
[1] "sum(Exp_tpmMean[[i]]):Inf"
[1] "sum(Exp_tpmMean[[i]]):10744.9523383307"
[1] "sum(Exp_tpmMean[[i]]):Inf"

huBioinfo commented 3 years ago

Dear User, I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.

I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by compCrosstalk_specific function.

Hi, thanks for your information. "typeSpecific" contains NaN, Inf and real numbers, which are normal. Could you help check "IntracellularNetwork_TypeA/B.txt"? If values in this file still are not NAs, can you share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") to me via huyuxuan@xidian.edu.cn or some other cloud storage? Thank you so much for your contribution. I really want to find out what caused the NA problem.

Thanks for your advise, IntracellularNetwork_TypeA/B.txt do not contain NA. Via the clue of typeSpecific, I found the function in compPEM might be the source of the problems, and there are some questions when I debug this function:

I loaded Exp_allCSV_NoLog.RData file, but allExpVector_NoLog contains more the 2 objects, e.g. my input folder has 5 file .csv file of RNA-seq data, which listed in allExpFile:
allExpFile
[1] "scRNAseq_Endo.csv"    "scRNAseq_Endo2.csv"   "scRNAseq_Germ.csv"   
[4] "scRNAseq_Germ2.csv"   "scRNAseq_Sertoli.csv"
which include ln-transformed and oringal raw counts data for typeA/B, but allExpVector_NoLog also contains 5 dataframe of each sample. I think that should be optimized for avoiding meaningless loading or computing in previous step.

The key point I found the may be here in compPEM, allExpVector_NoLog contains 3 Inf, which lead datasetSum to be Inf and make subsequential errors, but the ln-transformed data seem to correct, next I will remove other data and re-run with the folder only include ln-transformed data
for(i in 1:5){
         print(paste("sum(Exp_tpmMean[[i]]):", sum(Exp_tpmMean[[i]]), sep = ''))
 }
[1] "sum(Exp_tpmMean[[i]]):Inf"
[1] "sum(Exp_tpmMean[[i]]):9005.20052446813"
[1] "sum(Exp_tpmMean[[i]]):Inf"
[1] "sum(Exp_tpmMean[[i]]):10744.9523383307"
[1] "sum(Exp_tpmMean[[i]]):Inf"

Thanks for your details. You're right. The /Input/ folder should only contain ln-transformed data of all cell types in the microenvironment. From your screenshot, I saw you have three cell types in total: "Endo", "Germ" and "Sertoli". So the Input/ folder should only contain three scRNAseq_***.csv files. But I'm still confused with the NA values in "IntegratedNet_edgeCost_common.txt" file because this file contains edge cost which is only related with the values in the "IntracellularNetwork_TypeA/B.txt". Your mentioned "compPEM" is to compute cell-type-specificity that will be used to compute node prize (weight), not edge cost. The edge cost is very simple, just min-max normalized mutual information values. Could you also please check variable "MiList_value_TypA" in both "MI_TypA.RData" and "MI_topNet_TypA.RData". Does this variable only contains "NA"? Thanks!

i19870503 commented 3 years ago

After remove the other 3 samples in allExpFile and allExpVector_NoLog, the result of IntegratedNet_edgeCost_common.txt become correct and no NA produced. However, 5 step also error with IntegratedNet_nodePrize.txt in bt[xx].000000 folders, the result in IntegratedNet_nodePrize.txt is all Inf. May some precedure I did not run for the several comp_NodePrize function. I just re-run the whole script just now, I still dig the cause for the error in step 4 and I share you the information if have any progress, thanks.

The results you need pasted below:

MiList_value_TypA in MI_topNet_TypA.RData

> head(MiList_value_TypA,20)
 [1] 2.206825 2.042807 2.134017 2.068090 2.142557 2.018205 2.252712 2.179418
 [9] 2.227762 2.066330 2.477424 2.218102 2.237800 2.025070 2.220957 2.092494
[17] 2.044935 2.033903 2.169833 2.401331
> which(MiList_value_TypA == 'NA')
integer(0)
> which(MiList_value_TypA == 'NaN')
integer(0)
> which(MiList_value_TypA == 'Inf')
integer(0)
>

MiList_value_TypA in MI_TypA.RData

> load('MI_TypA.RData')
> head(MiList_value_TypA,20)
 [1] 0.4830955 0.4608728 0.4732309 0.4642983 0.4743879 0.4575394 0.4893128
 [8] 0.4793822 0.4859323 0.4640599 0.5197591 0.4846235 0.4872924 0.4584695
[15] 0.4850103 0.4676049 0.4611611 0.4596664 0.4780835 0.5094492
> which(MiList_value_TypA == 'NA')
integer(0)
> which(MiList_value_TypA == 'Inf')
integer(0)
> which(MiList_value_TypA == 'NaN')
integer(0)

i19870503 commented 3 years ago

Finally, I get the results successfully with ln-transformed data. And I found the causation of mine was located at comp_NodePrizeCellType.R (line4: `allExpFile <- list.files(path = InputPath, pattern = "scRNAseq"),theallExpFilecontains all the sample in Input folder, while profile data was calculated forallExpVectorandallExpVector_NoLog, which were used to subsequential results. Since other samples were not performed ln-transform and 'Inf' was produced duringcompPEM`, which finally make node prize in bt[xx].000000 folder became available.

huBioinfo commented 3 years ago

Hi User, thanks for your provided details on addressing this NA issue. I've already updated the "Important Usage Tips" on the README.md. Thanks for your contribution to CytoTalk.

huBioinfo / CytoTalk

Error during step5 with 10x scRNA data #11