victormurcia / PyUMLS_Similarity

This package computes a variety of similarity metrics between concepts present in the UMLS database. It also serves as a way to interface with the UMLS.
https://pypi.org/project/PyUMLS-Similarity/
MIT License
2 stars 1 forks source link

get error running the similarity code #1

Closed dianizzah closed 11 hours ago

dianizzah commented 1 week ago

from PyUMLS_Similarity import PyUMLS_Similarity mysql_info = {} mysql_info = { "username": "username", "password": "password", "hostname": "10.33.3.45", "socket": "MYSQL", "database": "umls" } umls_sim = PyUMLS_Similarity(mysql_info=mysql_info, work_directory='D:\apps\Strawberry\perl\bin') cui_pairs = [('C0018563', 'C0037303'), ('C0035078', 'C0035078'),] measures = ['lch', 'wup'] similarity_df = umls_sim.similarity(cui_pairs, measures)

the output is: Error calculating similarity for measure 'lch': [WinError 2] The system cannot find the file specified Error calculating similarity for measure 'wup': [WinError 2] The system cannot find the file specified

victormurcia commented 5 days ago

Hmm, that seems to indicate that it can't find either the files associated with those similarity measures or the input file that's created from the CUI pairs.

Did you run:

cpanm UMLS::Interface --force
cpanm UMLS::Similarity --force

That command installs the Perl packages associated with the UMLS Interface and UMLS Similarity required to connect to the database.

Also, to ensure that the Perl package is working correctly could you try the following:

  1. Create a system variable called UMLSINTERFACE_CONFIGFILE_DIR and set it to a directory where we'll store some logging files associated with the routine.
  2. Create a .txt file called umlssim_config.txt that has the following contents:
    SAB :: include MSH
    REL :: include CHD,PAR

    This file basically tells the program to use MSH source vocabulary of the UMLS and to use the CHD and PAR relations to determine paths between concepts. For this example let's say that we'll save it at D:\UMLS_Output\umlssim_config.txt

  3. Start up a Perl command line.
  4. Navigate to the location of the file called umls-similarity.pl, the default location should be somewhere in 'C:\Strawberry\perl\site\bin'. I see that in your case it seems that that file is in ''D:\apps\Strawberry\perl\bin'
  5. Once there run the following command:
umls-similarity.pl --measure path --config D:\UMLS_Output\umlssim_config.txt --username root --password password --hostname localhost --socket MYSQL --database umls --verbose hand skull"

When you run that, you should get an output that looks something like what's shown below: image

You should also check your MySQL and verify that a new schema called umlsinterfaceindex as shown below has been created. Within it you'll see a few tables and one of them should have 3 columns CUI DEPTH PATH that have all the various paths for a concept from the root node of the UMLS (C0000000). image

Lastly, you may also want to check the directory you set for UMLSINTERFACE_CONFIGFILE_DIR. There you'll find a file called MMSYS_2023AA_20230420_MSH_CHD_PAR_table that has the same contents as the table I mentioned earlier which is good to check when creating indeces for larger ontologies like SNOMED for instance. image

Let me know if that helps.

dianizzah commented 4 days ago

I have encountered persistent issues while attempting to use Strawberry Perl to install the UMLS modules, leading to repeated failures. Consequently, I switched to ActiveState Perl, which allowed for the installation and configuration of the desired modules. However, this approach still results in the same error previously mentioned. I implemented your suggested solution and successfully created the umlsinterfaceindex database, just as you did.

Despite this progress, I continue to face the same issues. I believe the problem stems from incorrect directory paths for perl.exe and umls-similarity.pl. Even after modifying the work_directory parameter in the pyUMLS_Similarity function to my Activestate perl bin folder where my perl.exe is located (C:/Users/asus/AppData/Local/ActiveState/cache/cdce2579/bin), the error persists.

Could you provide guidance on how to correctly change the directory to resolve this issue?

mysql_info = {}
mysql_info = {
    "username": "username",
    "password": "password",
    "hostname": "localhost",
    "socket": "MYSQL",
    "database": "umls"
}
umls_sim = PyUMLS_Similarity(mysql_info=mysql_info, work_directory="C:/Users/asus/AppData/Local/ActiveState/cache/cdce2579/bin")

cui_pairs = [
    ('C0018563', 'C0037303'),
    ('C0035078', 'C0035078'),
]

measures = ['path']
similarity_df = umls_sim.similarity(cui_pairs, measures)

The output produced:

C:\Strawberry\perl\bin\perl.exe C:\Strawberry\perl\site\bin\umls-similarity.pl --database=umls --username=username--password=password--hostname=localhost --socket=MYSQL --measure=path --precision=4 --forcerun --infile=C:\Users\asus\AppData\Local\Temp\umls-similarity-temp.txt
Error calculating similarity for measure 'path': [WinError 2] The system cannot find the file specified

For any additional information required, here is my system environment: Operating System: Windows 11 x64 MySQL Version: 8.0.37 Perl Version: 5.36.3 DBI Version: 1.643 DBD::mysql Version: 4.052 UMLS::Interface Version: 1.51 UMLS::Similarity Version: 1.49

victormurcia commented 4 days ago

Ah okay, I've found the errors.

The issue is that I wrote the code initially under the assumption of a standard Strawberry Perl installation. Hence, there are a few lines where I hard coded the cwd as "cwd = r'C:\Strawberry\perl\site\bin" (this is where the various .pl files associated with the UMLS interface and UMLS Similarity modules are located) as well as the location of the perl.exe.

That's an easy fix on my end. I'll make those corrections by the end of the day and that should hopefully resolve the pathing problems you are having.

victormurcia commented 11 hours ago

I've released version 0.1.1 which allows users to specify the location of the perl.exe and the working directory in case they are installed via a different location than the standard. This has fixed the problem above. If you run into a separate issue from this one, open a new issue.

Thanks!