phyloacc / PhyloAcc

PhyloAcc a software to detect the changes of conservation of a genomic region
GNU General Public License v3.0
26 stars 12 forks source link

How to set cluster options about PBS? #36

Closed lnyawen closed 1 year ago

lnyawen commented 1 year ago

Dear @xyz111131 ,

Thank you for developing such great software! I found that the -part "[STRING]" of Cluster options is about Slurm, however the cluster I am using is PBS, how should I set it?

Yawen

gwct commented 1 year ago

Hello, Yes, our development was done all on SLURM, but we are ready to add options for other clusters based on user demand. So we will be actively developing more options specifically for PBS based on your request, probably based on this profile. We'll work on this to get it implemented as quickly as possible, but I'll need to figure out a way to test the PBS profile since our cluster is SLURM-based.

lnyawen commented 1 year ago

Hi

I'm glad to help test the PBS profile if you'd like. And how should I do it?

Yawen

gwct commented 1 year ago

Thanks, that's a nice offer! I'll let you know here how we proceed. I'm actually going on vacation for the next 2 weeks though, so it is unlikely I'll be able to do anything until then, unfortunately.

lnyawen commented 1 year ago

OK, tell me what I should do when you get back. Have a good vacation!

Yawen

lnyawen commented 1 year ago

Hello authors,

I deploy profile of PBS follow this, and l successfully test the snakemake base on PBS profile with command snakemake -p -s ~/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/run_phyloacc.smk --configfile ~/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/phyloacc-config.yaml --profile pbs-torque --dryrun. PhyloAcc-test-data comes from here.

However, after the batches have completed, I use phyloacc_post.py -i phyloacc-test to gather the outputs. But I get an error:

--------------------------------------------------------
**Error OP5: Error reading tree from interface log file!
--------------------------------------------------------

How should I do to solve this error.

Thanks! Yawen

gwct commented 1 year ago

Hi Yawen, That's great that you got a PBS profile working for PhyloAcc! Would you be ok sharing it so we can try and work it in as an input option?

As for the error, I would need to see your interface log file to start to get an idea for what's happening. Can you copy it here if it isn't too large? Thanks!

lnyawen commented 1 year ago

Hello

Absolutely, I'm glad to share with you what I do . Firstly, I deployed profile of PBS with command

mkdir -p ~/.config/snakemake
cd ~/.config/snakemake
cookiecutter https://github.com/Snakemake-Profiles/pbs-torque.git
cd pbs-torque && chmod 755 pbs*

And then I performed the following command to create snakemake file, phyloacc.py -a simu_500_200_diffr_2-1.fa -b simu_500_200_diffr_2-1.bed -i id-subset.txt -m ratite.mod -o phyloacc-test -t "strCam;rhePen;rheAme;casCas;droNov;aptRow;aptHaa;aptOwe;anoDid" -g "allMis;allSin;croPor;gavGan;chrPic;cheMyd;anoCar" -n 4 -batch 5 -j 2 -part "core28" and geted resulting snakemake command that is printed to the screen snakemake -p -s /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/run_phyloacc.smk --configfile /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/phyloacc-config.yaml --profile /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/profiles/slurm_profile --dryrun then I replaced --profile /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/profiles/slurm_profile with --profile pbs-torque

Finally, I got the message that it ran successfully.

[Fri Oct 14 09:45:53 2022]
Finished job 0.
3 of 3 steps (100%) done
Complete log: .snakemake/log/2022-10-14T093959.452945.snakemake.log
unlocking
removing lock
removing lock
removed all locks

And I check the result directory phyloacc-test/phyloacc-job-files/phyloacc-output contain the result files.

These are all the commands I test PhyloAcc-test-data with PBS profile. Is there anything else I need to do?

And for the error of the phyloacc_post.py -i phyloacc-test, I found only two log files in the phyloacc-test directory, one is phyloacc-test.log and the other is final-results.log. Is interface log file you mentioned in them?

Thanks for your help!

gwct commented 1 year ago

So it just ran with the cookiecutter profile, that's great! That will be easy to incorporate.

For the logfile, I would need to see the phyloacc-test.log. Thanks!

lnyawen commented 1 year ago

Hello,

This is my phyloacc-test.log file, which is a little large.

[liunyw@mu01 phyloacc-test]$ cat phyloacc-test.log
# Welcome to PhyloAcc -- Bayesian rate analysis of conserved non-coding genomic elements.
# Version 2.0.0 released on April 1, 2022
# PhyloAcc was developed by Zhirui Hu, Han Yan, Gregg Thomas, Tim Sackton, Scott Edwards, and Jun Liu
# Citation:      https://doi.org/10.1093/molbev/msz049
# Website:       https://phyloacc.github.io
# Report issues: https://github.com/phyloacc/PhyloAcc
#
# The date and time at the start is: 10.14.2022 | 09:32:52
# Using Python version:              3.10.6
#
# The program was called as:         /gpfs/home/liunyw/mambaforge-pypy3/envs/PhyloAcc/bin/phyloacc.py -a simu_500_200_diffr_2-1.fa -b simu_500_200_diffr_2-1.bed -i id-subset.txt -m ratite.mod -o phyloacc-test -t strCam;rhePen;rheAme;casCas;droNov;aptRow;aptHaa;aptOwe;anoDid -g allMis;allSin;croPor;gavGan;chrPic;cheMyd;anoCar -n 4 -batch 5 -j 2 -part core28
#
# -----------------------------------------------------------------------------------------------------------------------------
# INPUT/OUTPUT INFO:
# Alignment file:                            /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/simu_500_200_diffr_2-1.fa
# Bed file:                                  /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/simu_500_200_diffr_2-1.bed
# Tree/rate file (mod file from PHAST):      /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/ratite.mod
# Tree read from mod file:                   (((((((((((((((taeGut:0.0465637,ficAlb:0.0538332)taeGut-ficAlb:0.00653656,pseHum:0.0414039)taeGut-pseHum:0.0162337,corBra:0.0350559)taeGut-corBra:0.104721,melUnd:0.0935108)taeGut-melUnd:0.0152322,falPer:0.0676997)taeGut-falPer:0.00595262,((picPub:0.154108,lepDis:0.0567586)picPub-lepDis:0.00987136,halLeu:0.046153)picPub-halLeu:0.00237951)taeGut-picPub:0.00502294,(((aptFor:0.0110665,pygAde:0.0132217)aptFor-pygAde:0.0216787,fulGla:0.0326388)aptFor-fulGla:0.0034278,nipNip:0.0427518)aptFor-nipNip:0.00725913)taeGut-aptFor:0.00238832,(balReg:0.0519596,chaVoc:0.0560994)balReg-chaVoc:0.0048854)taeGut-balReg:0.00599725,((calAnn:0.0977611,chaPel:0.081066)calAnn-chaPel:0.0304959,cucCan:0.101256)calAnn-cucCan:0.00794796)taeGut-calAnn:0.00244451,(colLiv:0.0945655,mesUni:0.0851707)colLiv-mesUni:0.0127853)taeGut-colLiv:0.0304131,((galGal:0.0376982,melGal:0.0420019)galGal-melGal:0.0915582,anaPla:0.0856191)galGal-anaPla:0.0361731)taeGut-galGal:0.0405465,((((((aptHaa:0.00138798,aptOwe:0.00163359)aptHaa-aptOwe:0.00305011,aptRow:0.00410502)aptHaa-aptRow:0.0277314,(casCas:0.0115431,droNov:0.0137378)casCas-droNov:0.0273843)aptHaa-casCas:0.0028791,(rheAme:0.00469461,rhePen:0.00533595)rheAme-rhePen:0.0566016)aptHaa-rheAme:0.00185129,(((cryCin:0.0470774,tinGut:0.038861)cryCin-tinGut:0.0172047,(eudEle:0.0654903,notPer:0.0730502)eudEle-notPer:0.00799637)cryCin-eudEle:0.0671317,anoDid:0.0560433)cryCin-anoDid:0.0251786)aptHaa-cryCin:0.0118409,strCam:0.0513888)aptHaa-strCam:0.0406895)taeGut-aptHaa:0.169725,((allMis:0.00896896,allSin:0.00775865)allMis-allSin:0.0142506,(croPor:0.0178745,gavGan:0.0144863)croPor-gavGan:0.0116871)allMis-croPor:0.147354)taeGut-allMis:0.0317238,(chrPic:0.0287726,cheMyd:0.0316043)chrPic-cheMyd:0.0842993)taeGut-chrPic:0.248317,anoCar:0.248317)taeGut-anoCar;
# Output directory:                          phyloacc-test
# PhyloAcc run directory:                    phyloacc-test/phyloacc-job-files
# Log file:                                  phyloacc-test.log
# -----------------------------------------------------------------------------------------------------------------------------
# DEPENDENCY PATHS:
# Program                                    Specified Path
# PhyloAcc                                   PhyloAcc-ST
# -----------------------------------------------------------------------------------------------------------------------------
# SPECIES GROUPS:
# Group                                      Species
# Targets (-t)                               strCam;rhePen;rheAme;casCas;droNov;aptRow;aptHaa;aptOwe;anoDid
# Conserved (-c)                             taeGut;ficAlb;pseHum;corBra;melUnd;falPer;picPub;lepDis;halLeu;aptFor;pygAde;fulGla;nipNip;balReg;chaVoc;calAnn;chaPel;cucCan;colLiv;mesUni;galGal;melGal;anaPla;cryCin;tinGut;eudEle;notPer
# Outgroups (-g)                             allMis;allSin;croPor;gavGan;chrPic;cheMyd;anoCar
# -----------------------------------------------------------------------------------------------------------------------------
# CLUSTER OPTIONS:
# Option                                     Setting
# Partition(s)                               core28
# Number of nodes                            1
# Max mem per job (gb)                       4
# Time per job                               1:00:00
# -----------------------------------------------------------------------------------------------------------------------------
# OPTIONS INFO:
# Option                                     Current setting                                   Current action
# -i:                                        /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/id-subset.txtOnly loci names specified in this file will be tested.
# -r:                                        st                                                All loci will be run with the species tree model of PhyloAcc
# -burnin:                                   500                                               This number of steps in the chain will discarded as burnin
# -mcmc:                                     1000                                              The number of steps in each chain
# -chain:                                    1                                                 The number of chains to run
# Loci per batch (-batch)                    5                                                 PhyloAcc will run this many loci in a single command.
# Current processes (-n)                     4                                                 This interface will use this many processes.
# Jobs (-j)                                  2                                                 PhyloAcc will submit this many jobs concurrently.
# Processes per job (-p)                     1                                                 Each job will use this many processes.
# --summarize                                False                                             PhyloAcc batch files will be generated and written to the job directory specified above.
# --theta                                    False                                             A species tree with branch lengths in coalescent units will NOT be estimated.
# --quiet                                    False                                             Time, memory, and status info will be printed to the screen while PhyloAcc is running.
# -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Date        Time      Current step                            Status                                  Elapsed time (s)    Step time (s)   Current mem usage (MB)   Virtual mem usage (MB)
# -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# 10.14.2022  09:32:52  Detecting compression of seq file       Success: No compression detected        0.42472             0.03124         70.62109                 4117.99219
# 10.14.2022  09:32:52  Reading input FASTA                     Success: 43 seqs read                   0.43554             0.01041         73.71875                 4118.1875
# 10.14.2022  09:32:52  Reading locus IDs                       Success: 10 IDs read                    0.43591             0.00021         73.71875                 4118.1875
# 10.14.2022  09:32:53  Detecting compression of bed file       Success: No compression detected        0.44941             0.01338         73.71875                 4118.1875
# 10.14.2022  09:32:53  Reading input bed file                  Success: 9 loci read                    0.44991             0.00039         73.71875                 4118.1875
# 10.14.2022  09:32:53  Partitioning alignments by locus        Success: 9 alignments partitioned       0.45018             0.00016         73.71875                 4118.1875
# 10.14.2022  09:32:53  Calculating alignment stats             Success: 9 alignments processed         0.46598             0.01568         73.98828                 4118.1875
# 10.14.2022  09:32:53  Writing: phyloacc-aln-stats.csv         Success: align stats written            0.46671             0.00046         73.98828                 4118.1875
# 10.14.2022  09:32:53  Writing PhyloAcc job files              Success: 2 jobs written                 0.47017             0.0033          73.98828                 4118.1875
# 10.14.2022  09:32:53  Writing Snakemake file                  Success: Snakemake file written         0.47062             0.00028         74.03906                 4118.1875
# 10.14.2022  09:32:53  Writing Snakemake config file           Success: Snakemake config written       0.47097             0.00023         74.03906                 4118.1875
# 10.14.2022  09:32:53  Writing Snakemake cluster profile       Success: Snakemake profile written      0.47146             0.00038         74.03906                 4118.1875
# 10.14.2022  09:32:53  Generating summary plots                Success                                 1.43354             0.96191         97.5625                  4307.90625
# 10.14.2022  09:32:53  Writing HTML summary file               Success                                 1.43424             0.00045         97.5625                  4307.90625
# ===============================================================================================================================================================================
#
# Done!
# The date and time at the end is: 10.14.2022 | 09:32:53
# Total execution time:            1.434 seconds.
# Output directory for this run:   phyloacc-test
# Log file for this run:           phyloacc-test/phyloacc-test.log
# Alignment stats file:            phyloacc-test/phyloacc-aln-stats.csv
# HTML summary file:               phyloacc-test/phyloacc-pre-run-summary.html
#
# PhyloAcc job files successfully generated
# Run the following command to run the PhyloAcc batches:

snakemake -p -s /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/run_phyloacc.smk --configfile /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/phyloacc-config.yaml --profile /gpfs/home/liunyw/biosoft/PhyloAcc-test-data/phyloacc-test/phyloacc-job-files/snakemake/profiles/slurm_profile --dryrun

# Then, if everything looks right, remove --dryrun to execute
# You may also want to start your favorite terminal multiplexer (e.g. screen, tmux)
# ===============================================================================================================================================================================
# 
gwct commented 1 year ago

Hi Yawen, I think I found the problem: the phyloacc_post.py script is still trying to use an old method for reading the trees. In fact I didn't even add the parameter to read it with the new method, so that's the actual error that is occurring. I will try to post an update sometime today or tomorrow and I'll let you know here when that goes through.

lnyawen commented 1 year ago

That's great! Thanks for your help!

gwct commented 1 year ago

Hey sorry for the slow response regarding the post-processing script. The PR with the updated version was stuck in the bioconda queue for a few days. Version 2.1.0 is up now and includes phyloacc_post.py, so you can try conda update phyloacc or just reinstalling it in a fresh environment and the script should now be callable.

lnyawen commented 1 year ago

Hello,

I have updated phyloacc to version 2.1.0, and phyloacc_post.py is working successfull !

Thanks for your kind help!

Yawen