Open HeldLab opened 3 years ago
Huh that's very weird that you get NaNs from the target_site_MI.py
command. Sorry that's happening.
To be clear the get_resi_indices
function does exist in the corrAnalysis.py
library (python would likely have thrown an exception if it couldn't find it) - line 705
def get_resi_indices(setResis, allResis):
finalList = []
for i in range(setResis.shape[0]):
indices = np.where(allResis == setResis[i])[0]
finalList.extend(indices)
return np.asarray(finalList)
Would you mind attaching the files you passed over to target_site_MI
as well as providing the command you ran? I'm hoping this is isn't some outdated python imports issue, and is hopefully a simple fix!
Yes, I see def get_resi_indices, I am not sure how I missed it. I have attached the files I am using, and suspect that perhaps the residues file I am testing (just using the indices.csv for now till I see how it works) is the issue since I don't see much documentation on its formatting. Any advice would be appreciated.
Note, these were all made using the fs-peptide tutorial, though target_site_MI isn't covered in that tutorial. I also had to rename the pdb as a .txt file to upload it here.
For the input, I prefer jupyterlab and basically hijacked/bypassed the argparser and remade your target_site_MI.py input collection lines to below. This strategy seemed to work will for all the other CARDS examples in the tutorial:
datafile = 'cards_holistic_MI.csv'
pdbFile = 'fs-peptide.pdb'
resi_file = 'indices.csv'
resiSet = genfromtxt('indices.csv', delimiter=",")
outName = 'targetSiteMIOutput'
data = main_method(datafile, pdbFile, resi_file, np.asarray(resiSet).astype(float))
Thanks for your help, Jason
Sukrit-
I'm circling back to this and wanted to see if you had any ideas based on the files I uploaded. I'm still getting all nan's for the target_site_MI output. The files I'm currently testing are too big to upload, but I can send you a link if you e-mail me at jheld a t wustl / edu. Cheers, Jason
Hi Jason -
Oops so sorry about never getting back to this! I got completely sidetracked by post-PhD moving and beyond.
suspect that perhaps the residues file I am testing (just using the indices.csv for now till I see how it works) is the issue since I don't see much documentation on its formatting.
The indices.csv
file (which is what you mean by residues?) is supposed to indicate the atoms in the topology that represent the dihedral in the corresponding row/column in the MI matrix. So the first row column of your matrix is representing communication with the dihedral corresponding to atoms [0,6,8, 14]. Does this seem reasonable for your topology? The indices file should go [φ, Ψ, χ] in order.
Let me dig into these files and see what's going on - at a glance I'm not sure what might be going on.
It's also worth noting that, to get at larger community structure in these kinds of MI matrices, one thing you can also do is "Affinity Propagation", which basically returns "communities" of residues that communicate within themselves more than other communities. We've used it to generate allosteric networks before in some published papers (check out figure 3B). If this is of interest to your needs, I can send along the code (or push it to github here!)
Again so sorry for the delayed response. Been doing a bit of different type of work post-PhD so haven't given this repo much attention.
Sukrit-
No problem at all, I was busy on many other things and the simulations needed to be extended. That said, I think I fixed it and my error was how I input the '--residues'. If I put in a list of more than 1 floats it seems to work. Now that I am getting a real result, I'll check back when I've had a look if I have questions. In the meantime, I am studying a homo-pentameric pLGIC, and I'm interested in what is making a loop of each subunit move, so if you have any thoughts on whether I should just 'target' the residues of a single loop, or go after all residues of all 5 loops in one go, that would be informative. But I'm more than happy to try both approaches.
I am super interested in your affinity propagation idea, and any code would be welcome! If you prefer, here or email or whatever is best. I could learn a lot from your expertise teasing apart allostery and protein regulation, so any ideas are welcome! Thanks! -Jason
Awesome! Yes I should update the README and apps script to make clearer what the example files look like (slash should just update the code in general since I wrote it early in grad school, lol).
if you have any thoughts on whether I should just 'target' the residues of a single loop, or go after all residues of all 5 loops in one go, that would be informative. But I'm more than happy to try both approaches.
I think ultimately there's not really any good "precedent" for analyzing one vs all at once. I'd probably start with just a single loop and see if MI is reached across chains (I would imagine to some extent, yes). If system is symmetric (ie a single loop's environment is always the same set of residues), then My initial intuition isis that doing all five at once would not be any more valuable than a single loop. However, comparing all-five-at-once to a single loop to be a good consistency check, so maybe that can inform your priorities re: analysis?
Affinity propagation is pretty great! I do have an old notebook that I can quickly push to github (will post here once it's done) that has an example analysis. I used to use NetworkX or a network viewing software like Gephi to visualize these networks. In the meantime, if you use Affinity Propagation directly, you'll at least get all the "Community assignments" for each residue!
Hi-
I am not getting nan correlation results from target_site_MI.py. Example output below..
... Computing for Residue 19.0 Obtained all values for Residue Computing for Residue 20.0 Obtained all values for Residue Computing for Residue 21.0 Obtained all values for Residue Computing for Residue 22.0 Obtained all values for Residue
but the plot that is made is empty.
Since print(corrAmount) returns, [[nan nan] [nan nan] [nan nan] ... [nan nan] [nan nan] [nan nan] [nan nan]]
I think the issue is at the step:
because 'ca.compute_dihedral_based_MI_to_resiSet_error(data, resiSet, allResis, pdbFile, 3.0)' requires these steps in corrAnalysis.py
but, the get_resi_indices function is not defined anywhere in the enspara or cards files on Github. Does that seem correct to you? Where can I find the get_resi_indices function?
Cheers, Jason