shahab-sarmashghi / RESPECT

Estimating repeat spectra and genome length from low-coverage genome skims
Other
11 stars 1 forks source link

problems running RESPECT #11

Closed svedwards closed 1 year ago

svedwards commented 1 year ago

Hi - your program is exactly what I need but unfortunately it's very difficult to install. I echo the request to put it directly on conda, otherwise it won't be used very widely.

I've tried to get the gurobi license and I think I installed it correctly but I am still getting many errors. If you are at the Broad perhaps we can get together in the new year to trouble shoot (I'm at Harvard). My error message is below. Any files written are empty or only have headings with no genome size estimates. - Scott

(respect) [sedwards@holy7c24101 RESPECT]$ respect -i ../../histdata/moa_bwa_to_LR_emu_kmer_jelly.hist -I ../../moa_ref/Input_read_length.txt -N 10 --debug /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:118: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. mapping_final = pandas.Series() 2022-12-23 10:31:57,963 INFO:Processing moa_bwa_to_LR_emu_kmer_jelly.hist... /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), 2022-12-23 10:31:58,559 INFO:Starting iterations to estimate parameters of moa_bwa_to_LR_emu_kmer_jelly.hist Set parameter TokenServer to value "rclic1.rc.fas.harvard.edu" 2022-12-23 10:31:59,010 INFO:Set parameter TokenServer to value "rclic1.rc.fas.harvard.edu" Failed to connect to token server 'rclic1.rc.fas.harvard.edu' (port 41954) - license file '/opt/gurobi/gurobi.lic'. Consult the Quick Start Guide for instructions on starting a token server. 2022-12-23 10:31:59,627 INFO:estimate_genome_skim_parameters finished in 1.5736157894134521 seconds 2022-12-23 10:31:59,627 ERROR:Error occurred when estimating parameters for /n/holylfs04/LABS/edwards_lab/Lab/sedwards/moa/histdata/moa_bwa_to_LR_emu_kmer_jelly.hist; it's skipped Traceback (most recent call last): File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/paramter_estimator.py", line 344, in call return self.estimate_genomic_parameters(*args, *kwargs) File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/paramter_estimator.py", line 326, in estimate_genomic_parameters self.estimate_genome_skim_parameters(spectra_number, error_norm, iterations_number, min_r1l, temperature) File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/timer.py", line 68, in wrapper_timer return func(args, *kwargs) File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/paramter_estimator.py", line 281, in estimate_genome_skim_parameters optimizer.run_simulated_annealing(iterations_number, min_r1l, temperature) File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py", line 395, in run_simulated_annealing repeat_spectra_next = self.estimate_repeat_spectra(o[1:], poisson_matrix_next[1:, :]) File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py", line 335, in estimate_repeat_spectra spectral_residuals = [1.0 constrained_spectra[i] / norm(constrained_spectra[i:], ord=1) for i in range( File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py", line 335, in spectral_residuals = [1.0 * constrained_spectra[i] / norm(constrained_spectra[i:], ord=1) for i in range( TypeError: 'NoneType' object is not subscriptable 2022-12-23 10:31:59,628 ERROR:Error occurred while trying to get estimated parameters for a sample 2022-12-23 10:31:59,636 INFO:Writing the results to the output files... (respect) [sedwards@holy7c24101 RESPECT]$

shahab-sarmashghi commented 1 year ago

Sorry that you had trouble using it. We have tried to make a conda version, but the dependency on Gurobi has prevented that. New students have joined my former lab working on this project and hopefully they will be able to replace Gurobi with a free and open-source python library to make it available on Conda in the near future.

The error you get seems to be related to installing/using the Gurobi license on a server, which has proven to cause issues. I am at the Broad and would be happy to help you out in person in the new year. In the meantime, if you can tell us more about how you are running RESPECT on this server, we might be able to offer some workarounds. Specifically, where have you installed the license? Is it on the same node that you are running RESPECT? Do you run RESPECT directly, or the server uses job scheduling system to assign it to a compute node?

svedwards commented 1 year ago

Hi Shahab -

Thanks for responding - particularly around the holidays! Yes, I have installed Gurobi on the same node as I am running respect. I am running it directly on the command line, not on submitted batch jobs that might use a different node. I see on our servers that we have an old Gurobi module (v. 9.5.2) available for public use, but when I use that I get an error saying that my licence can't use that version:

2022-12-26 12:38:32,360 INFO:Starting iterations to estimate parameters of moa_bwa_to_LR_emu_kmer_jelly.hist Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu" 2022-12-26 12:38:32,783 INFO:Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu" Request denied: license not valid for Gurobi version 10

I am not familiar with this "rclic2.rc.fas.harvard.edu" port, but otherwise I think I have Gurobi installed in the correct folder.

Anyway, thanks very much for your help. Next week I should be able to get some help from the research computing staff here, but if you know of anything worth trying now, I would be very grateful.

shahab-sarmashghi commented 1 year ago

Hi Scott,

After some digging in Gurobi support pages, here is my guess about what is causing the error based on the information you have provided: The server you are using already has an older version of Gurobi installed which uses a "floating" license to support a cluster system. However, using conda, you have installed the latest version of Gurobi which somehow cannot recognize the license you have obtained. I can think of two possible solutions that you can try:

Please try either of these solutions and let me know what happens.

svedwards commented 1 year ago

Hi Shahab -

*please see next comment, I think I got it working!**

Thanks for this help. It might make sense for us to move to email so I can share additional details about my set up. You can find my email on my Harvard web site (just google me). But I certainly don't expect you to spend more time on this - I am confident the RC folks here can help me once they open up again next week.

I first tried your option 2 and updated my .bashrc, based on your comments above and the Gurobi support:

export GUROBI_HOME="gurobi1000/linux64" export PATH="${PATH}:${GUROBI_HOME}/bin" export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib" export GRB_LICENSE_FILE="/n/home06/sedwards/opt/gurobi.lic"

However, when I run gurobi_cl I get:

[sedwards@boslogin03 ~]$ gurobi_cl Set parameter Username Set parameter LogFile to value "gurobi.log"

Failed to set up a license

Error 10009: HostID mismatch (licensed to 6f7f6c9f, hostid is 4ca23b2e)

I could share the info in my gurobi.lic file but that's probably best done on email. Indeed the licence ID in my .lic file is different from the host ID, as the error suggests.

For what it's worth, I also get a strange message when I run the bash script in the setup instructions:

[sedwards@boslogin03 ~]$ cd gurobi1000/linux64/bin [sedwards@boslogin03 bin]$ ls grbcluster grbgetkey grbprobe grb_ts grbtune gurobi_cl gurobi.sh python3.7 [sedwards@boslogin03 bin]$ ./gurobi.sh ./gurobi.sh: line 17: gurobi1000/linux64/bin/python3.7: No such file or directory

which is strange, since python3.7 is right there in the directory when I ls.

So, then I thought I would switch to the server version, which I loaded, then when I typed gurobi_cl it said:

[sedwards@holy7c24103 ~]$ gurobi_cl Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu" Set parameter LogFile to value "gurobi.log" Using license file /n/sw/eb/apps/centos7/Gurobi/9.5.2/linux64/gurobi.lic

So I changed my license path in my .bashrc to that path above and then tried to set the TokenServer with: gurobi_cl --server="rclic2.rc.fas.harvard.edu"

and got:

[sedwards@holy7c24103 ~]$ gurobi_cl --server="rclic2.rc.fas.harvard.edu" Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu" Set parameter ComputeServer to value "rclic2.rc.fas.harvard.edu" Set parameter LogFile to value "gurobi.log"

Error 10022: Failed to connect to rclic2.rc.fas.harvard.edu port 80 after 6 ms: No route to host (code 7, command POST http://rclic2.rc.fas.harvard.edu/api/v1/cluster/jobs)

[sedwards@holy7c24103 ~]$ gurobi_cl -t

Checking status of Gurobi token server 'rclic2.rc.fas.harvard.edu'...

Token server functioning normally. Maximum allowed uses: 4096, current: 0

So I feel I am close but still not connecting.

Anyway, don't trouble your self too much more on this, it's hard to diagnose from a distance.

svedwards commented 1 year ago

I think I may have gotten it to work:

Two files were generated:

estimated-parameters_3.txt: sample input_type sequence_type coverage genome_length uniqueness_ratio HCRM sequencing_error_rate average_read_length moa_bwa_to_LR_emu_kmer_jelly.hist histogram genome-skim 6.66 15256279039 0.22 0.23 0.0163 101

estimated-spectra_3.txt:

sample r1 r2 r3 r4 r5 moa_bwa_to_LR_emu_kmer_jelly.hist 3282671707 2019841917 474941038 192376886 680166448

I had to generate a new gurobi licence and point my .bashrc to that licence.

There was a lot of text written to the screen but no errors as far as I can tell:

[sedwards@holy7c24103 ~]$ respect -i /n/holylfs04/LABS/edwards_lab/Lab/sedwards/moa/histdata/moa_bwa_to_LR_emu_kmer_jelly.hist -I /n/holylfs04/LABS/edwards_lab/Lab/sedwards/moa/moa_ref/Input_read_length.txt -N 10 --debug /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:118: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. mapping_final = pandas.Series() 2022-12-26 23:25:41,276 INFO:Processing moa_bwa_to_LR_emu_kmer_jelly.hist... /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), 2022-12-26 23:25:41,901 INFO:Starting iterations to estimate parameters of moa_bwa_to_LR_emu_kmer_jelly.hist Set parameter Username 2022-12-26 23:25:42,333 INFO:Set parameter Username Academic license - for non-commercial use only - expires 2023-12-22 2022-12-26 23:25:42,336 INFO:Academic license - for non-commercial use only - expires 2023-12-22 /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._parameters_dataframe = self._parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra), 2022-12-26 23:25:49,526 INFO:estimate_genome_skim_parameters finished in 8.120830297470093 seconds /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:36: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. parameters_dataframe = parameters_dataframe.append( /n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:50: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. spectra_dataframe = spectra_dataframe.append(pandas.Series([parameter_estimator.output_name] + 2022-12-26 23:25:49,556 INFO:Writing the results to the output files...

shahab-sarmashghi commented 1 year ago

Glad that you got it to work! I hope we can simplify this for our users in the future. I have two comments to give this a closure:

So, then I thought I would switch to the server version, which I loaded, then when I typed gurobi_cl it said:

[sedwards@holy7c24103 ~]$ gurobi_cl Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu" Set parameter LogFile to value "gurobi.log" Using license file /n/sw/eb/apps/centos7/Gurobi/9.5.2/linux64/gurobi.lic

This probably means it has worked, and you had to just go ahead and run RESPECT! I think for the sever version, you should not modify .bashrc and set any environment variables, they are probably set at the system-level and should not be changed by the user. If you want to try this again, I'd suggest just deactivate/remove your conda installation of Gurobi, and remove any related environment variables from your .bashrc, run gurobi_cl to see if you get the same output as above, and then just go ahead and run RESPECT!

PS. To suppress verbose output from RESPECT after you have made sure it runs correctly, you can omit --debug option.

svedwards commented 1 year ago

Thanks again Shahab. Your suggestions are very helpful. I find that when I use the server version, I still get an error saying that my license is not valid for gurobi v. 10, but that is probably because, as you pointed out earlier, I have installed gurobi v. 10 somehow. At this point I may not rock the boat and just work with new licenses, but in the medium term (like next week!) I'll work with the IT staff here to sort things out.

Thanks again, your help has been much appreciated. Once the analyses are done, I'll let you know what I find out!

Scott

shahab-sarmashghi commented 1 year ago

I see, that totally makes sense. For the server version, you are right, you just need to make sure you load the same Gurobi python package (gurobipy) that is available on your server (v9.5.2), and not the latest one (v10.0.0). They should be able to help you and easily fix that.

You are very welcome, it actually helped me to better understand how the license should be managed on a server, so thank you! Looking forward to hearing about your findings!