michellab / FreeEnergyNetworkAnalysis

Software for automated processing of alchemical free energy calculations
10 stars 6 forks source link

changes for workshop so that the network can be analysed #26

Closed annamherz closed 2 years ago

annamherz commented 2 years ago

Hello! For the September CCPBioSim BSS workshop, when using freenrgworkflows as is, it is not able to analyse the provided network for the following results files (results_X.csv). image

The following warning shows up: Provided network is disconnected. Doing analysis on subgraph. However, the network should have enough edges I think.

As a result of this, after the first file is added, once the second set of data is added using the following code:

nA = networkanalysis.NetworkAnalyser()
first_file = False
for file_name in results_files:
    if first_file is False:
        nA.read_perturbations_pandas(file_name, comments='#')
        first_file = True
    else:
        nA.add_data_to_graph_pandas(file_name)

computed_relative_DDGs = nA.freeEnergyInKcal

the following error shows up:

--> 239 self._ddG_edges[u][v] = mean_edge
    240 self._weights[u][v] = prop_error
    242 averaged_edge_counter += 1
KeyError: 'ejm_55'

I realised this is because the edge for this ligand does not exist, as when the first file was loaded it used largest:

        # We want to know what the largest component is, so we know if we may not be able to estimate certain free
        # energies
        largest = max(nx.strongly_connected_components(graph), key=len)

        # populate compound list:
        self._compoundList = list(graph.nodes())
        self._compoundList.sort()
        if len(largest) < len(self._compoundList):
            warnings.warn('Provided network is disconnected. Doing analysis on subgraph.')

        # We only do the analysis for the largest connected set of nodes
        for node in largest:

when analysis was carried out for the node in largest, only about four edges were added to the network in self._ddG_edges. Changing this to:

for node in self._compoundList:

as has been done for the copy of freenergworkflows in the workshop allows full analysis for all edges to be carried out.

I was wondering if in this case only doing analysis for the largest connected set of nodes is robust to be able to handle when networks such as this one are passed in?

jmichel80 commented 2 years ago

is it safe to accept this PR @ppxasjsm ? We still want to migrate to cinabar in due course but need to maintain a functional network analysis tool for the time being