Closed ltscamfer closed 1 month ago
I'm getting the error TypeError: cannot pickle 'SSLContext' object
and a bunch of other stuff. Any ideas? Looks like it is occurring during the pool.map
call.
Nice! Without testing this, I have just some feedback on the kwarg structure. How about a single argument, n_jobs
, that has default n_jobs=1
. Then you can use if n_jobs != 1:
instead of if parallel:
and also simplify invocation of parallel processing for the user since they can turn on parallel with a single kwarg.
Related to this, you could implement something similar to joblib.Parallel
's setup for specifying cores:
If -1 is given, joblib tries to use all CPUs. The number of CPUs
n_cpus
is obtained withcpu_count()
. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. For instance, usingn_jobs=-2
will result in all CPUs but one being used.
I've found this really useful since you can set n_jobs=-2
for running on your local system and that's usually a pretty good call — and you don't have to remember how many cores you have. (Wouldn't want to do that on tamarack though, ha!) Just an optional idea here.
Note the change from cores
to n_jobs
since I believe the latter is more common across Python ecosystem (see, e.g., joblib and this 3rd party example).
I'm getting the error
TypeError: cannot pickle 'SSLContext' object
and a bunch of other stuff. Any ideas? Looks like it is occurring during thepool.map
call.
I think I have an idea what caused this:pool.map()
doesn't like being passed the obspy.fdsn.Client
object, so I have it re-establishing a client within the function called by pool.map()
. I'm really not sure why it was working fine on my local machine as I hadn't seen that error before. Could you try again with the new push? 🤞
Nice! Without testing this, I have just some feedback on the kwarg structure. How about a single argument,
n_jobs
, that has defaultn_jobs=1
. Then you can useif n_jobs != 1:
instead ofif parallel:
and also simplify invocation of parallel processing for the user since they can turn on parallel with a single kwarg.
@liamtoney, thanks for this! I agree this is a more streamlined approach and just the single kwarg is much nicer! I think I've implemented something similar to joblib.Parallel
.
I have added two more arguments to
gather_waveforms
andgather_waveforms_bulk
,parallel
andcores
. The first specifies whether you want to gather the waveforms in parallel and in the second you can choose how many cores you would like to allocate to the parallelization. The parallel processing uses themultiprocess
package, and will either parallelize by station or time window depending on how many cores you select and how many stations are in your request. New functions areparallel_gather
andget_increment_waveforms
, the latter of which is required when callingpool.map()
From my testing it should be set up to run as fast as possible, parallelizing by station when the station count exceeds cores requested, and using time windows otherwise. I experimented a bit with parallelizing by location code (for arrays), however it didn't seem to improve vs using time and I think adds unnecessary complexity. I also added a warning where if you request less than a day of data, it will tell you that running in parallel may be slower.
One small unrelated change is in In line 179 of
gather_waveforms
. It prints the full stream, however an issue arises when usinggather_waveforms_bulk
. When using the bulk gather, it prints the unmerged stream, as the merging and trimming is saved for the end and is not used whengather_waveforms
is called. This can be confusing as it can sometimes print over 100 traces when your data only contains 30 or so stations. @darren-tpk and I thought about the best way to deal with this and decided just removing the print statement altogether could be the best option, as users can always check the contents of their stream after using either function.Also, now that multiprocess is a new dependency I changed a few lines on files that I think(?) are related to readthedocs. Please feel free to remove those changed as I'm not positive they are correct!