Closed HarryCaveMan closed 33 minutes ago
I am going to attempt to pack two ray nodes (kuberay worker pods) onto each ec2 instance to see if this will help. I suspect that it is packing because the pods have 4 CPU each and the scheduler is not accounting for the neuron resource request. Will update.
Due to limitations in karpenter, I was unable to run the test mentioned above, however, I was able to resolve this by setting the runtime_env
for each worker, and changing my actor class for the map_batches call to use fn_constructor_args
rather than trying to maintain one instance of the model internally.
Old batch model example:
class Model:
def __init__(self,path):
self.model = AutoModel.from_pretrained(path)
self.tokenizer = AutoTokenizer.from_pretrained(path)
def predict(self,texts):
inputs = tokenizer(texts)
return model(**inputs)
class BulkModel:
def __init__(self,path):
self.path = path
self.model = None
def __call__(self,batch):
if not self.model:
self.model = Model(self.path)
return self.model.predict(batch)
bulk_inference_model = BulkModel('mymodelpath')
batches = docs.map_batches(
bulk_inference_model,
batch_size=128,
concurrency=4,
num_cpus=1,
resources={
"neuron_cores": 1
},
scheduling_strategy="SPREAD"
)
New batch model example:
class Model:
def __init__(self,path):
self.model = AutoModel.from_pretrained(path)
self.tokenizer = AutoTokenizer.from_pretrained(path)
def predict(self,texts):
inputs = tokenizer(texts)
return model(**inputs)
class BulkModel:
def __init__(self,path):
self.path = path
self.model = Model(path)
def __call__(self,batch):
return self.model.predict(batch)
batches = docs.map_batches(
BulkModel,
fn_constructor_args=('mymodelpath')
batch_size=128,
concurrency=4,
num_cpus=1,
resources={
"neuron_cores": 1
},
runtime_env={
"NEURON_RT_NUM_CORES":"1"
},
scheduling_strategy="SPREAD"
)
I still think that I probably should not be setting NEURON_RT_NUM_CORES
manually because it should always reflect the resource request for neuron_cores
, but this did prevent ray data from trying to allocate more cores than were available.
Closing because I think you found a workaround for this issue. Feel free to re-open if the issue comes up again
What happened + What you expected to happen
Have a ray (2.34.0, python3.10) cluster running two
inf2.xlarge
nodes. Each node has 1 neuron device and 2neuron_cores
. When I run the following:I expect 4 concurrent workers each occupying one neuron core. Instead, ray packs all 4 workers onto one node which causes both OOM errors and errors initializing neuron runtime (due to lack of available cores on the device). Unfortunately,
STRICT_SPREAD
is not a validscheduling_strategy
underray_remote_options
. Additionally, I am unable to use aPlacementGroupSchedulingStrategy
withSTRICT_SPREAD
because bundles only supportCPU
,GPU
, andMemory
resources and not accelerators, all the way down to the c++ core here.Versions / Dependencies
ray (2.34.0, python3.10) ray[data]==2.34.0
Reproduction script
Run the following transform on a ray cluster containing 2
inf2.xlarge
, model compiled with batch size 128:Issue Severity
High