zhxiaokang / RASflow

RNA-Seq analysis workflow
MIT License
105 stars 58 forks source link

Issue with visualization #3

Closed conlank closed 4 years ago

conlank commented 4 years ago

I am getting an error with the visualization step.

`Querying chunk 56

Finished

Pass returnall=TRUE to return lists of duplicate or missing query terms.

Error in '$<-.data.frame'('tmp', "lab", value = c("Lcn2", "Cd24a", "Sprr1a", : replacement has 55422 rows, data has 55421

Calls: plot.volcano.heatmap -> EnhancedVolcano -> $<- -> $<-.data.frame

Execution halted `

I am also getting the warning `Loading required package: ggrepel

Warning message:

In readLines(con) :

incomplete final line found on 'configs/config_main.yaml'

Querying chunk 1

Querying chunk 2

`

Any help with these? Am I missing a package in R? I also noticed it's not possible to use the main.py to run just the visualization without redoing the DEA, running it with DEA set to returns Which mapping reference will be used? genome Is DEA requred? False Is visualization requred? True Please double check the information above Do you want to continue? (y/n) y Start RASflow on project: project Trimming is not required Start mapping using genome as reference! Building DAG of jobs... Nothing to be done.

zhxiaokang commented 4 years ago

Thank you for reporting!

About your question of "it's not possible to use the main.py to run just the visualization without redoing the DEA", it's more about the logic behind the workflow: visualization is to visualize results of DEA, so visualization cannot happen without finishing DEA. So if you want to do visualization, DEA has to be set "yes". But if you have already finished DEA, and now only want to do visualization, no worry about "redoing" the DEA, because when Snakemake finds that the output files of DEA are already there, it will skip DEA.

About the errors, missing a package won't be the reason. Because once you set up the environment with env.yaml, all required packages are installed.

For "incomplete final line" in yaml file, I would guess that when you modified config_main.yaml, you accidentally removed some lines. Try adding an empty line in the end of the file.

For the error from volcano plot, I came to reproduce the same error on a new dataset. Because when package 'mygene' finds multiple gene symbols for one gene ID, it will return all of them (in your case, there is one gene ID with two gene symbols). The problem is now fixed by commit. Simply update your local files with git pull. But the solution needs an extra package, install it under your environment by conda install -c conda-forge r-hash

zhxiaokang commented 4 years ago

The issue seems solved. I'll close it. But don't hesitate to comment below if there's any further problem.