Open biochem-fan opened 4 years ago
Hi Takanori,
Yes, we have basic Topaz-Relion integration wrappers for denoising and picking provided by a contributor that we are going to add once we test them. Once we add that to the Topaz repository, we will let you know so that you play with them and improve them as you wish.
Best, -Alex
In addition to what Alex said about RELION wrappers already in the works, I am happy to accept pull requests implementing most of these features as long as they do not change the default topaz interface/behavior. My thoughts on your specific feature requests:
topaz extract < micrograph_paths.txt
.topaz split
command, but it writes these to a target output directory. I would accept a pull request implementing individual star files being written to the same location as their corresponding micrographs as an optional argument for extract
. This seems like a useful feature.@tbepler Thanks for your commend. Sorry, I didn't notice your response.
- Respecting directory structure.
In topaz extract
, the output txt
file contains only the file name without extension (e.g. 001
), even when I run the program with topaz extract DatasetA/*.mrc
. I want the output to be DatasetA/001.mrc
etc to distinguish it from DatasetB/001.mrc
. Another situation is when processing images from EPU. EPU generates a directory per grid square (e.g. GridSquare_XXXX/Data/FoilHole_YYYY.mrc
). When we split the txt file into individual STAR files, we need the path, otherwise we don't know where to write the file.
Yes, topaz extract
drops the directory and strips the file extension when creating the image name for consistency with topaz train
and to make associating the particles with different versions of the micrographs easier. For example, lets say you have DatasetA/raw/ DatasetA/denoised/ DatasetA/corrected/ ...
each containing micrographs named in the same way. Then, the particle file output by topaz extract
maps to each of these easily. It would be straightforward to add an option to topaz extract to not trim the micrograph paths.
The problem could also be addressed by adding an option to topaz extract
to write the particle coordinates out as individual files to the same locations as the inputs. This may be a more elegant solution, because it would also work for topaz denoise and other commands that write micrographs. For example, if inputs are topaz extract DatasetA/001.mrc DatasetA/002.mrc DatasetB/001.mrc DatasetB/002.mrc ...
, then the outputs would be DatasetA/001_particles.txt DatasetA/002_particles.txt DatasetB/001_particles.txt DatasetB/002_particles.txt
or something like that. The "_particles.txt" part could be a user defined suffix.
Hello, I was wondering how the RELION integration is going? I was giving topaz a try a few days ago and could not figure out how to get the coordinated integrated into the RELION 3.1.0. workflow. Is there currently a workaround for generating coordinate star files from the particles picked from Topaz?
Thank you in advance, Kevin
Hi Kevin,
Sorry for the delay. There are scripts available here for use as Relion 3.1 plugins:
https://github.com/tbepler/topaz/tree/master/relion_run_topaz
The denoising scripts are complete. The picking scripts are still under development, so consider them as beta releases. I hope to find time in the next week to finish those.
Best, -Alex
run_topaz_pick.py executed from RELION 3.1.0
/home/peter/.conda/envs/topaz/run_topaz_pick.py --o External/job028/ --in_mics Select/job015/micrographs.star --number_of_particles 400 --scale_factor 6 --trained_model $
stops with error:
File "/home/peter/.conda/envs/topaz/run_topaz_pick.py", line 131
g.to_csv(f'{outpath}{k}_topazpick.star' , sep='\t', index=False, columns=['x_coord','y_coord','score'], header=None)
^
SyntaxError: invalid syntax
Any suggestions on what could be the reason and how to make the script functional?
@PiotrDra This sounds like a python version problem. The f'...' syntax requires python 3.6 or newer. Can you check that your topaz install is using python 3?
Hi, My particle is small and consists of 2 homologous globular domains which has made picking very difficult so far and I am really interested to see how Topaz performs, especially with the top-views which are just a dot on the micrograph.
I have tried to use Topaz but cannot manage to extract particles in Relion using the coordinates from Topaz picking. I used the Relion integration of Topaz to Denoise and Train on 7689 micrographs using the particles.star file from my best 3D map as positive labels. A Relion particle extraction job using the coords_suffix_topazpicks.star file (written by run_topaz_pick.py) and micrographs.star file as input failed to extract any particles although many coordinates were written by Topaz picking (stderr: Warning: coordinate file External/job568/__/raw/GridSquare_7115372/Data/FoilHole_8065276_Data_7120079_7120081_20191008_0757_fractions_topazpicks.star does not exist...) Job 568 was the run_topaz_pick.py job
I think that the issue is point 3 made by @biochem-fan : "Respect the directory structure
For example, a user might have Dataset1/001.mrc and Dataset2/001.mrc. Currently Topaz only looks at the file name, so these two get mixed up."
My dataset was collected using EPU and as mentioned above "EPU generates a directory per grid square (e.g. GridSquare_XXXX/Data/FoilHole_YYYY.mrc)".
My directory structure in previous Relion AutoPick jobs was:
AutoPick/jobXXX/__/raw/GridSquare_XXX/Data/FoilHole_XXX_fractions_autopick.star
My micrographs.star file that I use for run_topaz_pick.py contains the following for each micrograph in the _rlnMicrographName and _rlnCtfImage columns: MotionCorr/jobXXX/__/raw/GridSquare_xxx/Data/FoilHole_xxx_fractions.mrc (there is a double underscore directory one up from raw - github doesn't want to write that) CtfFind/jobXXX/__/raw/GridSquare_xxx/Data/FoilHole_xxx_fractions.ctf:mrc
As you can see, the per-grid-square directory structure is carried through and since it is not maintained by Topaz, I cannot use the generated coordinates for further processing in Relion.
Can you please suggest a work-around for this? I don't have any python knowledge and have no idea how to fix this issue. Regards Lizelle
As a workaround, running extract on each of the directories individually should solve this problem.
Extract can also be run once per micrograph, e.g.
for micrograph.mrc in set_of_micrographs;
topaz extract micrograph.mrc ...
This also allows writing one output file per micrograph.
commit 752c140a709c745dabdcc2232b6e9444a11e1ef1 adds support for writing extracted coordinates as one file per micrograph and also adds support for piping the micrograph paths to topaz.
This is a dirty patch but solves the issue of working with images scattered in many sub-directories. When I have time, I will refactor this using my STAR file parser.
diff --git a/relion_run_topaz/run_topaz_pick.py b/relion_run_topaz/run_topaz_pick.py
index 198133e..e8f2d64 100644
--- a/relion_run_topaz/run_topaz_pick.py
+++ b/relion_run_topaz/run_topaz_pick.py
@@ -4,16 +4,22 @@
# This is to run Topaz picker (https://github.com/tbepler/topaz) from Relion as an External job type
# Rafael Fernandez-Leiro 2020 - CNIO - rfleiro@cnio.es
# Alex J. Noble 2020 - NYSBC - anoble@nysbc.org
+# @biochem_fan 2020
# Run with Relion external job
# Provide executable in the gui: run_topaz_pick.py
# Input micrographs.star
# Provide extra parameters in the parameters tab (scalefactor, trained_model, pick_threshold, select_threshold, skip_pi
+# TODO
+# Earlier error check
+# Number of workers
+# Continue
"""Import >>>"""
import argparse
import os
+import re
"""<<< Import"""
"""USAGE >>>"""
@@ -93,13 +99,31 @@ os.system(cmd)
"""make star files >>>"""
#make star files in the right folder
print('Making star files...')
-os.system(str('''relion_star_printtable ''')+inargsMics+str(''' data_micrographs _rlnMicrographName | awk -F"/" 'NR==1{
-tmpdf=open(tmpfile).readline().rstrip('\n')
-outopaz_path=outargsPath+tmpdf+'/'
-os.system(str('mkdir ')+outopaz_path+str(';rm ')+tmpfile)
+os.system('relion_star_printtable %s data_micrographs _rlnMicrographName > %s' % (inargsMics, tmpfile))
+
+basename_to_dir = {}
+for line in open(tmpfile):
+ original_filename = line.rstrip()
+ dirname = os.path.dirname(original_filename)
+ filename = os.path.basename(original_filename)
+ filename_without_ext = filename[:filename.rfind('.')]
+ # strip job path
+ m = re.match("[^/]+/job\d+\/", dirname)
+ if m:
+ dirname = dirname[m.end():]
+
+ if filename_without_ext in basename_to_dir:
+ sys.stderr.write("ERROR: Sorry, you cannot have two files with the same, even if they are in different directories")
+ sys.exit(-1)
+ basename_to_dir[filename_without_ext] = dirname
+
+os.remove(tmpfile)
+
mic_filenames=list(set([x.split('\t')[0] for x in open(outargsResults2).readlines()[1:]]))
topaz_picks=[x.split('\t') for x in open(outargsResults2).readlines()[1:]]
for name in mic_filenames:
+ outopaz_path=outargsPath+basename_to_dir[name]+'/'
+ os.makedirs(outopaz_path, exist_ok=True)
star_file=outopaz_path+name+'_topazpicks.star'
with open(star_file, 'w') as f:
Hi @biochem-fan and @tbepler . Thanks for the advice, I appreciate it!
We've had some PC issues and I haven't tried the new version yet but the patch looks like a good idea. Unfortunately, I've never used one before and don't quite understand how to use it. Should I modify the run_topaz_pick.py script in my Relion directory to match the one above? Should any lines be removed from the original script?
I'm also not too clear on the usage in Relion. I've denoised the selection of micrographs for model training and used the resulting denoised micrographs.star with trained model as input for picking but this denoised micrographs.star file does not contain the directory names anymore, just the file names. Should I use the denoised or the original selection of micrographs (where the directory names are still present) for picking with this patch applied? I don't see how the directory names would be known if I used the denoised micrographs.star file. However, when I used the micrographs.star file before denoising as an input for topaz picking I obtained zero picks.
Can I apply a similar patch to the denoising script and proceed with picking from denoised micrographs.star before running Extraction in Relion using the topaz_picks_scaled.star file (containing the correct directories as part of the micrograph names) and the original (not denoised) micrographs.star file as input?
@LizelleLL
Unfortunately, I've never used one before and don't quite understand how to use it. Should I modify the run_topaz_pick.py script in my Relion directory to match the one above?
Yes.
Should any lines be removed from the original script?
+
means add the line, -
means remove the line.
That being said, if you are not familiar with these things, I recommend you to wait until my patch is tested and incorporated into the official distribution.
Regarding denoising:
Because I myself don't use denoising, it is of lower priority for me. The idea is the same. I hope the original developers work on it.
Thanks for the reply @biochem-fan We have an excellent IT person in our unit who should be able to help me use this patch. After Takanori mentioned that he doesn't use denoising I was wondering @tbepler , is the picking algorithm affected by denoising or is it simply useful for a person to manually evaluate the model training and picking? If it is not affected by denoising, can I use denoised micrographs to choose the best parameters for model training on my current 3D model's particles and also the best parameters for picking, then go back to the original noisy micrographs and use these optimized parameters for training and picking (without manual evaluation since I won't be able to see my particle)? That way I can use Takanori's patch for picking and proceed with processing in Relion. Please let me know if you think this may work Thanks Lizelle
Hi Lizelle,
In our limited tests of training Topaz picking models on raw versus denoised micrographs, we do not see an improvement using denoised micrographs over raw micrographs. So you should use whichever is most convenient for your workflow.
Be aware, however, that we strongly advise that you do not use denoised particles for particle alignment. Please refer to the paragraph on the hallucination problem in the Discussion section of the Topaz-Denoise paper: https://www.nature.com/articles/s41467-020-18952-1
Best, -Alex
Hi All, Our IT guy helped with the patch for topaz picking. He mentioned that there were some line-length problems in the patch from @biochem-fan and I am pasting his cleaner patch here below. The directory structure was maintained and I could use the coordinates for Extraction in Relion. Thanks for all the help!
--- run_topaz_pick.py 2020-10-07 14:48:00.370394000 +0200
+++ /tmp/run_topaz_pick.py 2020-10-21 14:52:34.379936000 +0200
@@ -4,16 +4,22 @@
# This is to run Topaz picker (https://github.com/tbepler/topaz) from Relion as an External job type
# Rafael Fernandez-Leiro 2020 - CNIO - rfleiro@cnio.es
# Alex J. Noble 2020 - NYSBC - anoble@nysbc.org
+# @biochem_fan 2020
# Run with Relion external job
# Provide executable in the gui: run_topaz_pick.py
# Input micrographs.star
# Provide extra parameters in the parameters tab (scalefactor, trained_model, pick_threshold, select_threshold, skip_pick)
+# TODO
+# Earlier error check
+# Number of workers
+# Continue
"""Import >>>"""
import argparse
import os
+import re
"""<<< Import"""
"""USAGE >>>"""
@@ -93,13 +99,31 @@
"""make star files >>>"""
#make star files in the right folder
print('Making star files...')
-os.system(str('''relion_star_printtable ''')+inargsMics+str(''' data_micrographs _rlnMicrographName | awk -F"/" 'NR==1{print $(NF-1)}' > ''')+tmpfile)
-tmpdf=open(tmpfile).readline().rstrip('\n')
-outopaz_path=outargsPath+tmpdf+'/'
-os.system(str('mkdir ')+outopaz_path+str(';rm ')+tmpfile)
+os.system('relion_star_printtable %s data_micrographs _rlnMicrographName > %s' % (inargsMics, tmpfile))
+
+basename_to_dir = {}
+for line in open(tmpfile):
+ original_filename = line.rstrip()
+ dirname = os.path.dirname(original_filename)
+ filename = os.path.basename(original_filename)
+ filename_without_ext = filename[:filename.rfind('.')]
+ # strip job path
+ m = re.match("[^/]+/job\d+\/", dirname)
+ if m:
+ dirname = dirname[m.end():]
+
+ if filename_without_ext in basename_to_dir:
+ sys.stderr.write("ERROR: Sorry, you cannot have two files with the same, even if they are in different directories")
+ sys.exit(-1)
+ basename_to_dir[filename_without_ext] = dirname
+
+os.remove(tmpfile)
+
mic_filenames=list(set([x.split('\t')[0] for x in open(outargsResults2).readlines()[1:]]))
topaz_picks=[x.split('\t') for x in open(outargsResults2).readlines()[1:]]
for name in mic_filenames:
+ outopaz_path=outargsPath+basename_to_dir[name]+'/'
+ os.makedirs(outopaz_path, exist_ok=True)
star_file=outopaz_path+name+'_topazpicks.star'
with open(star_file, 'w') as f:
f.write('# version 30001\n\ndata_\n\nloop_\n_rlnCoordinateX #1\n_rlnCoordinateY #2\n_rlnAutopickFigureOfMerit #3\n')
In the next major update of RELION (3.2, not 3.1.x; hopefully early next year), Topaz wrapper is integrated into an AutoPick job, not as an External job type. It is currently being test in house. With that, problems associated with directories and "Continue" should be solved.
Meanwhile, please use the above patch. @LizelleLL, thanks for feedback and testing.
I and @scheres are interested in better RELION integration of Topaz.
Several things we wish are:
topaz convert
supports now.Dataset1/001.mrc
andDataset2/001.mrc
. Currently Topaz only looks at the file name, so these two get mixed up.Some of these can be implemented outside Topaz as a separate converter or a wrapper, but I think it is more efficient to have them inside Topaz itself. For example, a wrapper can make a new working directory and makes symbolic links to relevant files and call Topaz, but this can easily get messy.
@alexjnoble Are you working on any of them? (I saw your tweet: https://twitter.com/alexjamesnoble/status/1267000205838364673) If you are too busy to work on them, I can try myself and send a pull request. Do you have something you don't want to have inside Topaz?