Open ewijaya opened 4 years ago
Hi Edward, As you note, when running ICGS (currently version 2), with a command like you specified, the Guide3 results in the ICGS folder and final NMF-defined clusters (marker gene visualized with typically many more clusters) will also have cell-type predictions. Indeed, these are much better in the current version in which there are marker genes for thousands of cell-type specific signatures. First, I would confirm that you are getting the ICGS-NMF folder which provide the primary results for ICGS2. the command you are using is fine, but is typically too stringent for large droplet sequencing experiments. For example:
python AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --ChromiumSparseMatrix /Users/saljh8/DemoData/mouse.h5 --output /Users/saljh8/DemoData/ --runICGS yes --expname test
The more verbose version of this command (displaying default options) is:
python AltAnalyze.py --platform RNASeq --species Mm --restrictBy protein_coding --excludeCellCycle no --removeOutliers yes --ChromiumSparseMatrix "/Users/saljh8/DemoData/mouse.h5" --output "/Users/saljh8/DemoData/" --runICGS yes --expname test --downsample 2500 --column_method hopach --column_metric cosine --rho 0.2 --ExpressionCutoff 1 --FoldDiff 4 --SamplesDiffering 4 --restrictBy protein_coding --numVarGenes 500 --numGenesExp 500
ICGS2 applies a dynamic correlation cutoff which begins at 0.2 and increases by 0.1 if > 5000 correlated variable genes are obtained.
if you had a tab-delimited file with counts you would add: --dataFormat counts if you want to force a specific number of target clusters: --k 23
If the enriched blue terms do not display in these results (Guide3 in ICGS or FinalMarkerHeatmap in ICGS-NMF), I would assume there was an issue with downloading the GO-Elite database which can be found in the software AltDatabase folder under EnsMart72/goelite/Hs/gene-mapp/Ensembl-BioMarkers.txt
If present, you can try to add these cell-type enrichment results by finding the text file corresponding to the heatmap of interest (e.g., ICGS-NMF/FinalMarkerHeatmap.txt) and supplying the --clusterGOElite BioMarkers option with the hierarchical clustering command:
python AltAnalyze.py --image hierarchical --platform RNASeq --species Mm --display False --input "/Users/saljh8/DemoData/ICGS-NMF/FinalMarkerHeatmap.txt" --contrast 5 --color_gradient yellow_black_blue --column_method None --row_method None --column_metric cosine --row_metric correlation --normalization median --clusterGOElite BioMarkers
This uses the prior clustering rather than re-clustering (replace None with hopach to re-cluster). You can see what the print out is which should indicate cell-type enrichments or produce a specific error if something is missing.
Hi Nathan,
Thank you so much for your prompt response. Can you advise the exact command line I can use for the attached TSV matrix file as input? The TSV file downloadable here.
Using my initial command line, I looked at ICGS-NMF
subdirectory, but it only contains one file FinalGroups.txt
.
Thanks and I hope to hear from you again.
E.
P.S. I can't find the example DemoData/ICGS-NMF/FinalMarkerHeatmap.txt
in your github.
Hi Edward,
I used the example path as a local path on my machine, but an example path in the GitHub (which is build with an older version of ICGS2 without as nice graphics) is:
GitHub/altanalyze/DemoData/ICGS/10xGenomics/Mm-e14.5_Kidney-GSE104396/precomputed_results/ICGS-NMF
If your output contains FinalGroups.txt, you can post any errors in the log file (should be closer to the end) that is produced by AltAnalyze (designed output directory: AltAnalyze_timestamp.log).
The file you contained is formatted properly, but the extension should just be changed to ".txt". For example:
python AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --expdir "/Users/saljh8/DemoData/matrix.txt" --output "/Users/saljh8/DemoData/" --runICGS yes --expname test
Hi Nathan,
Thank you for your reply. I tried this following command:
/home/ubuntu/storage2/Tools/altanalyze/AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --runICGS yes --expdir /home/ubuntu/storage2/tmp/test_altanalyze/output/rnaseq_altanalyze_fc2/ExpressionInput/result.txt --output /home/ubuntu/storage2/tmp/test_altanalyze/output --rho 0.4 --column_method None --expname rnaseq_altanalyze_fc2 --ExpressionCutoff 1 --FoldDiff 2 --SamplesDiffering 3
The input file result.txt
can be downloaded here and the log file here.
I still cannot produce plot with cell type assignment in ICGS-NMF directory.
As you will notice in the log file. There are some errors. I'm not sure if it's
caused by my data or real bug in the code:
File "/home/ubuntu/storage2/Tools/altanalyze/stats_scripts/ICGS_NMF.py", line 1049, in CompleteICGSWorkflow
NMFinput,Rank=NMF_Analysis.FilterGuideGeneFile(Guidefile,Guidefile_block,processedInputExpFile,iteration,platform,uniqueIDs,symbolIDs)
File "/home/ubuntu/storage2/Tools/altanalyze/stats_scripts/NMF_Analysis.py", line 114, in FilterGuideGeneFile
rank_Count=int(q[n-1])
ValueError: invalid literal for int() with base 10: 'NA'
Thanks and I hope to hear from you again. E.
I have the following data downloadable here.
Now, I'm using the most recent version of AltAnalyze.
However when I tried the following script:
I cannot get this kind of plot where the cell type is assigned on the left. Like the previous version of AltAnalyze.
I have removed the old version and don't know anymore which the previous version can create that. Please advice how can I go about it.