Closed i19870503 closed 2 years ago
Dear User,
I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.
I got similar errors.
Traceback (most recent call last):
File "gen_PCSF.py", line 11, in
checked my "IntegratedNet_edgeCost_common.txt" and it's all NAs. but "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" are good. looks like something wrong when generating edge cost? please advise.
I got similar errors.
Traceback (most recent call last): File "gen_PCSF.py", line 11, in Cost = numpy.loadtxt("IntegratedNet_edgeCost.txt", dtype = 'float') File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1148, in loadtxt for x in read_data(_loadtxt_chunksize): File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in read_data items = [conv(val) for (conv, val) in zip(converters, vals)] File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in items = [conv(val) for (conv, val) in zip(converters, vals)] File "/home/rstudio/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 736, in floatconv return float(x) ValueError: could not convert string to float: 'NA' [1] "2021-08-30 13:14:19 UTC" [1] "(6/7) Generating the final signaling network between the two cell types...(around 25 min)" Error in { : task 1 failed - "missing value where TRUE/FALSE needed" Calls: genSignalingNetwork ... genSummaryPCSF -> runAnalysisFile -> %dopar% -> Execution halted
checked my "IntegratedNet_edgeCost_common.txt" and it's all NAs. but "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" are good. looks like something wrong when generating edge cost? please advise.
Hi, it seems a major problem related with the data. Could you please share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") under the /Output/ folder to me via huyuxuan@xidian.edu.cn or some other cloud storage? I'll carefully look into this "NA" problem. Thanks for your report.
Dear User,
I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.
I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by compCrosstalk_specific
function in construct_integratedNetwork.R
Dear User, I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.
I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by
compCrosstalk_specific
function.
Hi, thanks for your information. "typeSpecific" contains NaN, Inf and real numbers, which are normal. Could you help check "IntracellularNetwork_TypeA/B.txt"? If values in this file still are not NAs, can you share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") to me via huyuxuan@xidian.edu.cn or some other cloud storage? Thank you so much for your contribution. I really want to find out what caused the NA problem.
Dear User, I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.
I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by
compCrosstalk_specific
function.Hi, thanks for your information. "typeSpecific" contains NaN, Inf and real numbers, which are normal. Could you help check "IntracellularNetwork_TypeA/B.txt"? If values in this file still are not NAs, can you share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") to me via huyuxuan@xidian.edu.cn or some other cloud storage? Thank you so much for your contribution. I really want to find out what caused the NA problem.
Thanks for your advise, IntracellularNetwork_TypeA/B.txt do not contain NA. Via the clue of typeSpecific, I found the function in compPEM might be the source of the problems, and there are some questions when I debug this function:
I loaded Exp_allCSV_NoLog.RData file, but allExpVector_NoLog contains more the 2 objects, e.g. my input folder has 5 file .csv file of RNA-seq data, which listed in allExpFile:
allExpFile
[1] "scRNAseq_Endo.csv" "scRNAseq_Endo2.csv" "scRNAseq_Germ.csv"
[4] "scRNAseq_Germ2.csv" "scRNAseq_Sertoli.csv"
which include ln-transformed and oringal raw counts data for typeA/B, but allExpVector_NoLog
also contains 5 dataframe of each sample. I think that should be optimized for avoiding meaningless loading or computing in previous step.
The key point I found the may be here in compPEM
, allExpVector_NoLog contains 3 Inf, which lead datasetSum to be Inf and make subsequential errors, but the ln-transformed data seem to correct, next I will remove other data and re-run with the folder only include ln-transformed data
for(i in 1:5){
print(paste("sum(Exp_tpmMean[[i]]):", sum(Exp_tpmMean[[i]]), sep = ''))
}
[1] "sum(Exp_tpmMean[[i]]):Inf"
[1] "sum(Exp_tpmMean[[i]]):9005.20052446813"
[1] "sum(Exp_tpmMean[[i]]):Inf"
[1] "sum(Exp_tpmMean[[i]]):10744.9523383307"
[1] "sum(Exp_tpmMean[[i]]):Inf"
Dear User, I guess the problem is related with the correctness of computation of mutual information between genes. Are all the values in the "IntegratedNet_edgeCost_common.txt" NAs? If so, can you please check the values in your generated "MutualInfo_TypA_Para.txt" and "MutualInfo_TypB_Para.txt" in the /Output/ folder? If those values are NAs, maybe you should re-run the CytoTalk using ln-transformed normalized data. I suggest to use Seurat to normalize 10X-generated raw count data with default settings, which can produce ln-transformed normalized data. Please let me know if the problem still exists.
I re-run the script with ln-transformed data, and still got the same error, the MutualInfo_TypA/B_Para data looked normal with no NAs. Now I check the process step by step and found that results in 'typeSpecific' were Inf or NaN, which produced by
compCrosstalk_specific
function.Hi, thanks for your information. "typeSpecific" contains NaN, Inf and real numbers, which are normal. Could you help check "IntracellularNetwork_TypeA/B.txt"? If values in this file still are not NAs, can you share your two intermediate files ("Exp_cleaned_2.RData" and "IntracellularNetwork_TypeA.txt") to me via huyuxuan@xidian.edu.cn or some other cloud storage? Thank you so much for your contribution. I really want to find out what caused the NA problem.
Thanks for your advise, IntracellularNetwork_TypeA/B.txt do not contain NA. Via the clue of typeSpecific, I found the function in compPEM might be the source of the problems, and there are some questions when I debug this function:
- I loaded Exp_allCSV_NoLog.RData file, but allExpVector_NoLog contains more the 2 objects, e.g. my input folder has 5 file .csv file of RNA-seq data, which listed in allExpFile:
allExpFile [1] "scRNAseq_Endo.csv" "scRNAseq_Endo2.csv" "scRNAseq_Germ.csv" [4] "scRNAseq_Germ2.csv" "scRNAseq_Sertoli.csv"
which include ln-transformed and oringal raw counts data for typeA/B, but
allExpVector_NoLog
also contains 5 dataframe of each sample. I think that should be optimized for avoiding meaningless loading or computing in previous step.
- The key point I found the may be here in
compPEM
, allExpVector_NoLog contains 3 Inf, which lead datasetSum to be Inf and make subsequential errors, but the ln-transformed data seem to correct, next I will remove other data and re-run with the folder only include ln-transformed datafor(i in 1:5){ print(paste("sum(Exp_tpmMean[[i]]):", sum(Exp_tpmMean[[i]]), sep = '')) } [1] "sum(Exp_tpmMean[[i]]):Inf" [1] "sum(Exp_tpmMean[[i]]):9005.20052446813" [1] "sum(Exp_tpmMean[[i]]):Inf" [1] "sum(Exp_tpmMean[[i]]):10744.9523383307" [1] "sum(Exp_tpmMean[[i]]):Inf"
Thanks for your details. You're right. The /Input/ folder should only contain ln-transformed data of all cell types in the microenvironment. From your screenshot, I saw you have three cell types in total: "Endo", "Germ" and "Sertoli". So the Input/ folder should only contain three scRNAseq_***.csv files. But I'm still confused with the NA values in "IntegratedNet_edgeCost_common.txt" file because this file contains edge cost which is only related with the values in the "IntracellularNetwork_TypeA/B.txt". Your mentioned "compPEM" is to compute cell-type-specificity that will be used to compute node prize (weight), not edge cost. The edge cost is very simple, just min-max normalized mutual information values. Could you also please check variable "MiList_value_TypA" in both "MI_TypA.RData" and "MI_topNet_TypA.RData". Does this variable only contains "NA"? Thanks!
After remove the other 3 samples in allExpFile and allExpVector_NoLog, the result of IntegratedNet_edgeCost_common.txt become correct and no NA produced. However, 5 step also error with IntegratedNet_nodePrize.txt in bt[xx].000000 folders, the result in IntegratedNet_nodePrize.txt is all Inf. May some precedure I did not run for the several comp_NodePrize function. I just re-run the whole script just now, I still dig the cause for the error in step 4 and I share you the information if have any progress, thanks.
The results you need pasted below:
MiList_value_TypA in MI_topNet_TypA.RData
> head(MiList_value_TypA,20)
[1] 2.206825 2.042807 2.134017 2.068090 2.142557 2.018205 2.252712 2.179418
[9] 2.227762 2.066330 2.477424 2.218102 2.237800 2.025070 2.220957 2.092494
[17] 2.044935 2.033903 2.169833 2.401331
> which(MiList_value_TypA == 'NA')
integer(0)
> which(MiList_value_TypA == 'NaN')
integer(0)
> which(MiList_value_TypA == 'Inf')
integer(0)
>
MiList_value_TypA in MI_TypA.RData
> load('MI_TypA.RData')
> head(MiList_value_TypA,20)
[1] 0.4830955 0.4608728 0.4732309 0.4642983 0.4743879 0.4575394 0.4893128
[8] 0.4793822 0.4859323 0.4640599 0.5197591 0.4846235 0.4872924 0.4584695
[15] 0.4850103 0.4676049 0.4611611 0.4596664 0.4780835 0.5094492
> which(MiList_value_TypA == 'NA')
integer(0)
> which(MiList_value_TypA == 'Inf')
integer(0)
> which(MiList_value_TypA == 'NaN')
integer(0)
Finally, I get the results successfully with ln-transformed data.
And I found the causation of mine was located at comp_NodePrizeCellType.R (line4: `allExpFile <- list.files(path = InputPath, pattern = "scRNAseq"),the
allExpFilecontains all the sample in Input folder, while profile data was calculated for
allExpVectorand
allExpVector_NoLog, which were used to subsequential results. Since other samples were not performed ln-transform and 'Inf' was produced during
compPEM`, which finally make node prize in bt[xx].000000 folder became available.
Hi User, thanks for your provided details on addressing this NA issue. I've already updated the "Important Usage Tips" on the README.md. Thanks for your contribution to CytoTalk.
I run the test data without any probelm, however, the error poped when I analyzed our data at step 5. I check the result at step 4, and found that IntegratedNet_edgeCost_common.txt only contained NA, RootNode.txt and IntegratedNet_edge.txt appeared to be normal. Beside, I loaded the Rdata results which may used for step 5, e.g. integratedNet_EdgeCost_common and integratedNet_GenePrize_initial in IntegratedNet_TypATypB_ID.RData were NaN. Please help me to fix the problem.
I noticed that the test data was ln-transformed count, but I used the raw count data from 10x genomics, I don't know it is the key point for the error. I post the error message blew: