Closed martinschorb closed 3 years ago
Hi Martin,
Here are the key parameters we typically run with at Janelia:
--conf spark.executor.cores=30
--conf spark.executor.memory=500g
--conf spark.driver.memory=14g
--maxFeatureSourceCacheGb 2
--maxFeatureCacheGb 15
At Janelia, the Spark executors run on nodes with 32 cores, so for each node we give 30 cores to the executor and leave 2 cores for the JVM. We typically run the Spark driver on just one core. We are allocated 15g of memory per core.
Each Spark executor JVM maintains one shared feature source cache containing the source image pixels for each tile and one shared feature cache containing the SIFT features extracted for each tile (or for each tile edge when montage matching). Both --maxFeatureSourceCacheGb and --maxFeatureCacheGb have a default value of 2. My guess is that this is the source of your problem since you are only allocating 4G per worker and it is important leave some memory for non-cache elements.
I suggest you try increasing your worker memory allocation or decreasing the cache sizes.
Here is a little more context to help with deciding how to size the caches ...
The feature source cache improves performance for same layer (montage) matching since tiles can be loaded once and then pulled from cache when rendering each of the potentially four clipped edges. The feature source cache is not as important for cross layer matching since tiles are not clipped - cross layer tiles get rendered once and the extracted features then get cached.
The feature cache improves performance for cross layer matching since features can be extracted once and then reused when looking for matches across different layers.
Technically, you could measure the impact of different cache sizes by comparing runs. I have not done that, but it would be interesting to see. Setting the max cache sizes to 0 will disable caching altogether.
Finally, the caches do have to be cleaned-up when they fill. So it is a good idea to only make them as large as you really need. Output from the tile pair client is sorted to keep similar pairs together with the hope of improving cache hits and allowing smaller cache sizes to be sufficient.
Hope this helps, Eric
I've been fiddling around with this now for a while, but I cannot find out what is the role of the spark driver in connection with the Java memory allocation.
I run spark as a standalone and the jobs get assigned to workers properly. However, since the client launches spark-submit with the deploy-mode 'client' it basically means that Java gets assigned resources from the driver which in this case is equal to the submission node. This has very limited resources, so for some reason, no matter how much memory I assign to the workers, it only loads 1024M. This is the reason why it crashes. When I manually enforce launching the driver on one of the worker nodes (deploy-mode 'cluster'), the driver gets assigned one full worker node with all its resources. BUT: the memory issues are gone.
How can I assign specific resources to the driver? I don't want to waste many CPUs for it.
And when I run with deploy-mode cluster, this is what it shows me in terms of executors:
This doesn't look right to me, it seems it is only running something on one node, or is it a problem with the UI?
in fact this is the only node where things have moved further than:
20/10/19 17:07:56 INFO ResourceUtils: ==============================================================
20/10/19 17:07:56 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
20/10/19 17:07:56 INFO Executor: Starting executor ID 6 on host 10.11.13.130
20/10/19 17:07:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40071.
20/10/19 17:07:56 INFO NettyBlockTransferService: Server created on 10.11.13.130:40071
20/10/19 17:07:56 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/10/19 17:07:56 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(6, 10.11.13.130, 40071, None)
20/10/19 17:07:56 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(6, 10.11.13.130, 40071, None)
20/10/19 17:07:56 INFO BlockManager: Initialized BlockManager: BlockManagerId(6, 10.11.13.130, 40071, None)
but not much further. the last update came more than 15 minutes ago...
2020-10-19 17:08:57,880 [Executor task launch worker for task 1] [partition 1] INFO [org.janelia.alignment.match.cache.CanvasFeatureListLoader] load: exit
2020-10-19 17:08:57,880 [Executor task launch worker for task 1] [partition 1] INFO [org.janelia.render.client.spark.SIFTPointMatchClient] derive matches between 0000.0014.00446::TOP and 0000.0022.00446::BOTTOM
2020-10-19 17:08:57,880 [Executor task launch worker for task 1] [partition 1] INFO [org.janelia.alignment.match.CanvasFeatureMatcher] deriveMatchResult: entry, canvas1Features.size=110105, canvas2Features.size=116238
2020-10-19 17:08:58,287 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.alignment.match.CanvasFeatureExtractor] extractFeatures: exit, extracted 117342 features, elapsedTime=27966ms
2020-10-19 17:08:58,287 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.alignment.match.cache.CanvasFeatureListLoader] load: exit
2020-10-19 17:08:58,287 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.render.client.spark.SIFTPointMatchClient] derive matches between 0000.0012.00442 and 0000.0012.00443
2020-10-19 17:08:58,287 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.alignment.match.CanvasFeatureMatcher] deriveMatchResult: entry, canvas1Features.size=116677, canvas2Features.size=117342
this looks reasonable, but after that nothing more happened..
I somehow have the feeling that the whole spark standalone setup I am running lacks something crucial to work properly.
Hi Martin,
I set up a new spark cluster that (I hope) is similar to your setup so that I could help you.
A couple of keys that you might have missed are:
The --conf spark.default.parallelism= option should be set to the number of executor cores * the number of executors. I think you only saw 2 active tasks because this parameter was not set.
If you are using the client deploy-mode, you need to use the --driver-memory option instead of the --conf spark.executor.memory= option. Alternatively, you could edit conf/spark-defaults.conf. This is mentioned here: https://spark.apache.org/docs/latest/configuration.html . The driver memory displayed in the web UI seems to show the same value no matter what the actual max memory is for the driver. I don't know why. You can see the actual max by looking at the java process created by spark-submit.
Here are the steps I took to successfully setup and run my test:
#---------------------------
ssh <master-node>
cd ${SPARK_TEST_DIR}
curl -o spark-3.0.1-bin-hadoop2.7.tgz "https://apache.osuosl.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz"
tar xzf spark-3.0.1-bin-hadoop2.7.tgz
cd spark-3.0.1-bin-hadoop2.7
cp conf/log4j.properties.template conf/log4j.properties
./sbin/start-master.sh
# get spark master URL from log
grep "spark://" logs/spark-*Master-1*.out
#---------------------------
ssh <worker-node-1>
cd ${SPARK_TEST_DIR}; export MASTER_URL="spark://<master-node>:7077"
./sbin/start-slave.sh --cores 8 --memory 16G ${MASTER_URL}
# check http://<master-node>:8080/ to confirm first worker is there
#---------------------------
ssh <worker-node-2>
cd ${SPARK_TEST_DIR}; export MASTER_URL="spark://<master-node>:7077"
./sbin/start-slave.sh --cores 8 --memory 16G ${MASTER_URL}
# check http://<master-node>:8080/ to confirm second worker is there
#---------------------------
ssh <driver-node>
cd ${SPARK_TEST_DIR}; export MASTER_URL="spark://<master-node>:7077"
export APP_ARGS="--baseDataUrl ... --pairJson ..."
./bin/spark-submit \
--class org.janelia.render.client.spark.SIFTPointMatchClient \
--master ${MASTER_URL} \
--deploy-mode client \
--driver-memory 2g \
--conf spark.executor.cores=8 \
--conf spark.executor.memory=16g \
--conf spark.default.parallelism=16 \
${APPLICATION_JAR} \
${APP_ARGS} \
--maxFeatureSourceCacheGb 7 \
--maxFeatureCacheGb 7
Let me know if this helps, Eric
I got the general concept. My problem is that I need to dynamically assign the resources of the nodes and I cannot get a driver to launch like that. I can start the master node and register as many workers as I want. However, there is no driver that can connect. If I simply launch another shell on some node it cannot submit executors. It needs to be submitted from the initial submission node. However there, I cannot assign resources to the driver, so we end up with the memory problems. When I try to submit from another node, it cannot assign resources and does not connect.
My simple problem is, how do I properly start the d*mn driver...
By now, I think I tried all the various shell juggling scripts that are around on github for getting it to work. But without success...
this is getting frustrating...
aaaaaahh
I have to submit the spark jobs from the same process that starts the master and excecutors...
It seems to work now.
Hi,
I just re-tried running the client on a new stack and got this error again.
I tried running the executors with 32GB each and it failed. Then I removed 4GB from each worker node for the driver (same as there driver memory spec), but still...
Hi Martin,
This line in slurm-6768634.out
indicates that everything ran fine but no matches were found:
20/11/17 16:26:20 INFO SIFTPointMatchClient: generateMatchesForPairs: saved matches for 0 out of 120 pairs (0%) on 4 partitions
Which log has the error?
I have these warnings again:
20/11/17 16:26:14 WARN Master: App app-20201117162613-0000 requires more resource than any of Workers could have.
20/11/17 16:26:14 WARN Master: App app-20201117162613-0000 requires more resource than any of Workers could have.
in spark-master.../logs/spark-...
.
That's why I thought it is a memory issue. I re-ran that data now and still get no matches. This is a bit strange, since the match trials I did look really nice.
Also, for all the workers the stdout fills the .err
instead of the .out
file, so I figured that something went wrong during execution. In some of the worker log directories, it also puts the jar file, while in others it doesn't.
Looking at the worker's logs, it seems that it indeed does not identify matches:
2020-11-18 10:06:46,536 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.alignment.match.CanvasFeatureMatcher] deriveMatchResult: entry, canvas1Features.size=13, canvas2Features.size=6
2020-11-18 10:06:46,536 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.alignment.match.MatchFilter] filterMatches: filtered 0 inliers from 0 candidates
2020-11-18 10:06:46,537 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.alignment.match.CanvasFeatureMatcher] deriveMatchResult: exit, result={'consensusSetSizes' : [0], 'inlierRatio' : 0.0}, elapsedTime=0s
2020-11-18 10:06:46,537 [Executor task launch worker for task 0] [partition 0] INFO [org.janelia.render.client.spark.SIFTPointMatchClient] derived matches for 0 out of 30 pairs, cache stats are CacheStats{hitCount=0, missCount=236, loadSuccessCount=236, loadExceptionCount=0, totalLoadTime=9073800455, evictionCount=0}
these are my parameters that I use for launching:
--SIFTfdSize 6 --SIFTminScale 0.75 --SIFTmaxScale 1 --SIFTsteps 4 --matchRod 0.5 --matchMaxEpsilon 3.0 --matchMinInlierRatio 0.0 --matchMinNumInliers 9 --matchMaxNumInliers 200 --maxFeatureCacheGb 6 --maxFeatureSourceCacheGb 6 --renderScale 0.5 --matchIterations 400 --matchMaxTrust 4
and this is the match trial that works for many random tilepairs that I checked:
"siftFeatureParameters" : {
"fdSize" : 6,
"minScale" : 0.75,
"maxScale" : 1.0,
"steps" : 4
},
"matchDerivationParameters" : {
"matchRod" : 0.5,
"matchModelType" : "SIMILARITY",
"matchIterations" : 400,
"matchMaxEpsilon" : 3.0,
"matchMinInlierRatio" : 0.0,
"matchMinNumInliers" : 9,
"matchMaxTrust" : 4.0,
"matchFilter" : "SINGLE_SET",
"matchFullScaleCoverageRadius" : 150.0
}
with
"stats" : {
"pFeatureCount" : 3122,
"pFeatureDerivationMilliseconds" : 962,
"qFeatureCount" : 3210,
"qFeatureDerivationMilliseconds" : 857,
"consensusSetSizes" : [ 22 ],
"matchDerivationMilliseconds" : 3707,
"aggregateDeltaXStandardDeviation" : 0.9397684112672783,
"aggregateDeltaYStandardDeviation" : 0.8933347787834419,
"consensusSetDeltaXStandardDeviations" : [ 0.9397684112672783 ],
"consensusSetDeltaYStandardDeviations" : [ 0.8933347787834419 ],
"overlappingImagePixels" : 397932,
"overlappingCoveragePixels" : 302708,
"matchQualityMilliseconds" : 9
}
Do you spot any discrepancy?
It seems I am missing a few parameters...
I now added
--matchModelType SIMILARITY --matchFullScaleCoverageRadius 150.0
still no matches...
Is it the filters? How can I control these in the parameters?
By default, the built-in filter is enabled for the SIFTPointMatchClient (equivalent to including filter=true in the match trial render parameters URL). The filter is intended as a simplistic intensity correction measure. You can disable it in the client by including --renderWithFilter false
.
I'm happy to take a closer look to see if I can identify what's different between your trial run and your client run.
Minimally, please send me:
That should be enough to spot the issue. If not, I might ultimately need the source tile images for the pair.
Oh gosh, my stupid...
Thanks for that simple suggestion of putting the calling command into the log. That lead me to my mistake, I was pulling the parameters from a wrong JSON file still referring to the wrong pixel scaling...
Indeed I have not had any clipping parameter in there. Does it make sense to use it? What is the default overlap zone it search in for in-plane tiles (TOP-BOTTOM,...)?
Clipping greatly improves performance for montage (in-plane) tile matching. To see how much, just run 2 trials - one with clipping and one without. There is no default overlap zone (the default is no clipping). You specify how much to clip through the --clipWidth
and --clipHeight
parameters supported by the various match clients.
How much to clip simply depends upon how much overlap there was during acquisition. Ideally, you want to clip your largest possible overlap but no more.
In general, same layer match sources should be clipped and rendered at higher scales while cross layer match sources should not be clipped and can be rendered at lower scales.
Hi,
I am running into
java.lang.OutOfMemoryError: Java heap space
when trying to run the SIFT client on spark. Even though every worker has 4GB of memory assigned, it only initializes them with 1024M and shows them with much less (366M) when running.Any idea what could cause this?