Closed DDT-scrip closed 3 weeks ago
To resolve the WORKER TIMEOUT and code 134 errors when using NamastexLabs LLM Data Iterator to iterate through a large CSV file in Langflow, you can adjust the max_concurrency
and use_multithreading
parameters in the DirectoryComponent
configuration.
Here is an example of how you can configure these parameters:
def build(
self,
path: str,
depth: int = 0,
max_concurrency: int = 1, # Reduce concurrency to avoid timeouts
load_hidden: bool = False,
recursive: bool = True,
silent_errors: bool = False,
use_multithreading: bool = False, # Disable multithreading to avoid code 134 errors
) -> List[Optional[Record]]:
resolved_path = self.resolve_path(path)
file_paths = retrieve_file_paths(resolved_path, load_hidden, recursive, depth)
loaded_records = []
if use_multithreading:
loaded_records = parallel_load_records(file_paths, silent_errors, max_concurrency)
else:
loaded_records = [parse_text_file_to_record(file_path, silent_errors) for file_path in file_paths]
loaded_records = list(filter(None, loaded_records))
self.status = loaded_records
return loaded_records
By setting max_concurrency
to a lower value and disabling use_multithreading
, you can mitigate the issues related to worker timeouts and code 134 errors [1].
To continue talking to Dosu, mention @dosu.
please explain the code 134 errors @dosu.
Code 134 errors typically indicate that a process was terminated due to an abort signal, often caused by a critical issue such as a segmentation fault or an out-of-memory condition. In the context of Langflow, these errors can occur when the system runs out of resources or encounters a severe error during execution.
To mitigate code 134 errors, you can try the following:
max_concurrency
setting to reduce the number of concurrent tasks.use_multithreading
to False
to avoid potential issues with multithreading.Here's an example configuration:
def build(
self,
path: str,
depth: int = 0,
max_concurrency: int = 1, # Reduce concurrency to avoid timeouts
load_hidden: bool = False,
recursive: bool = True,
silent_errors: bool = False,
use_multithreading: bool = False, # Disable multithreading to avoid code 134 errors
) -> List[Optional[Record]]:
resolved_path = self.resolve_path(path)
file_paths = retrieve_file_paths(resolved_path, load_hidden, recursive, depth)
loaded_records = []
if use_multithreading:
loaded_records = parallel_load_records(file_paths, silent_errors, max_concurrency)
else:
loaded_records = [parse_text_file_to_record(file_path, silent_errors) for file_path in file_paths]
loaded_records = list(filter(None, loaded_records))
self.status = loaded_records
return loaded_records
Adjusting these settings can help prevent the system from running into critical issues that lead to code 134 errors.
To continue talking to Dosu, mention @dosu.
will code 134 cause WORKER TIMEOUT error? @dosu
Yes, code 134 errors can cause a WORKER TIMEOUT error. Code 134 typically indicates that a process was terminated due to an abort signal, often caused by critical issues such as a segmentation fault or an out-of-memory condition. When such a severe error occurs, it can lead to the worker process being terminated abruptly, resulting in a timeout as the system waits for a response that never comes. This is why reducing concurrency, disabling multithreading, and ensuring sufficient timeout settings are crucial to mitigate these issues.
To continue talking to Dosu, mention @dosu.
Hey, I made this flow a while ago and had these issues. That's why there is the start index to process in batches.
However you can try to set another .env variable like this:
LANGFLOW_FRONTEND_TIMEOUT=6000
Try this and check if it works
@vasconceloscezar I tried the variable setting and didn't find any significant change, even when I increased the setting to 60000 The flow might be too heavy for my bad hardware Although I wondered where is the bottleneck, it was not the system memory, but more likely the graphics memory? But the loop starts up just fine, and the amount of data in each row is roughly the same.
Maybe the unloading of the LLM isn't clean, so each invoke eats up more and more memory?
If you have any insight on this I would really like to know. PS: would setting up langsmith/langwatch or other recommed tools help debug these kinds of errors?
@dosu explain what is timeout and frontend timeout, and how to change these setting
In Langflow, there are two main timeout settings you can configure: worker_timeout
and frontend_timeout
.
Worker Timeout (worker_timeout
):
300
seconds.Settings
class located in src/backend/base/langflow/services/settings/base.py
:class Settings(BaseSettings):
# Other settings...
worker_timeout: int = 300
"""Timeout for the API calls in seconds."""
# Other settings...
--timeout
option, which defaults to 60
seconds:- `--timeout`: Sets the worker timeout in seconds. The default is `60`.
Frontend Timeout (frontend_timeout
):
0
seconds, which means no timeout.Settings
class as well:class Settings(BaseSettings):
# Other settings...
frontend_timeout: int = 0
"""Timeout for the frontend API calls in seconds."""
# Other settings...
To change these settings, you can either modify the .env
file or set the environment variables directly in your system. For example, to set the frontend_timeout
to 6000
seconds, you can add the following line to your .env
file:
LANGFLOW_FRONTEND_TIMEOUT=6000
Or, you can set it directly in your terminal before running Langflow:
export LANGFLOW_FRONTEND_TIMEOUT=6000
Regarding your concern about memory management and the unloading of LLMs, Langflow handles memory management for LLM invocations using the fix_memory_inputs
function and has a cleanup routine implemented via the flush_langfuse_callback_if_present
function. This ensures that memory inputs are managed and any langfuse
callbacks are properly flushed, aiding in memory management and cleanup [1].
If you suspect that the issue might be related to memory leaks or performance bottlenecks, using built-in telemetry and logging tools in Langflow can help diagnose these issues. Langflow collects various performance metrics and provides a logs page to inspect component executions and message history, which can be useful for debugging [2][3][4].
If you need further assistance, feel free to ask!
To continue talking to Dosu, mention @dosu.
Is this still happening in the latest langflow version?
I just iterated through a CSV with 30 rows, and everything worked as expected.
Can you please test it at the main branch?
@vasconceloscezar Currently, I'm on langflow: 1.0.17 ollama: 0.3.9
I tried to increase the --timeout setting in CLI to 180 and increase the LANGFLOW_FRONTEND_TIMEOUT=6000 in the env file, , but it makes the situation worse (15-20 rows down to less than 10)
I tried updating Ollama to the newest version and changing different models (llama3, 3.1, mistral-nemo, phi3) no matter whether the model is big or small it all stops iterating at around 15-20 row
I am suspicious it may be caused by ollama, as some other user also experience similar issue
There is no crash, it just stops generating and hangs, without error code: 9月 04 08:55:22 llmserver ollama[221701]: [GIN] 2024/09/04 - 08:55:22 | 200 | 21.357729469s | 127.0.0.1 | POST "/api/chat" 9月 04 08:55:42 llmserver ollama[221701]: [GIN] 2024/09/04 - 08:55:42 | 200 | 20.279408554s | 127.0.0.1 | POST "/api/chat" 9月 04 08:55:51 llmserver ollama[221701]: [GIN] 2024/09/04 - 08:55:51 | 200 | 8.569732642s | 127.0.0.1 | POST "/api/chat"
then langflow pop out the code 134 error:
"2024-09-05 11:37:33 - DEBUG -
but changing the timeout and frontend timeout also affect the performance, so currently I'm not sure what's the problem
This could be an issue with ollama indeed, I'll try to setup ollama here and give it a try.
Could you test with a cheaper model like gpt 4 mini? Just to see if it works?
@vasconceloscezar Due to limited hardware, all my models (llama3, 3.1, mistral-nemo, phi3) are the smaller ones phi3:3.8b 2.2GB, llama3:8b 4.7GB, mistral-nemo 23b 7.1GB
all quite cheap models my setup is 2 X 1060 6GB, so the model won't exceed the memory of a single card, and definitely won't exceed both cards the model response time of my CSV iterate prompt averages around 20 seconds in successful iteration when it suddenly stops it usually stop under 10s of proccess
@DDT-scrip
Do you need any assistance with this case? If not, please let us know if this issue can be closed.
@carlosrcoelho I currently move the loop out of the Langflow system, using API to solve the issue the loop still stops but not after hundreds of iterations hours of running (still the WORKER TIMEOUT (pid:249640), Worker (pid:) was sent code 134! issue thought) so the issue still exists, but moving the loop out to an exterior python script it's much better
and it makes me suspect more it's caused by ollama
I think this issue can be closed, and thanks for your help
Bug Description
While iterating through a CSV file using NamastexLabs LLM Data Iterator, if the iteration count accessed a certain threshold after a few successful iterates, errors will suddenly appear
CRITICAL 2024-08-21 11:28:46 - CRITICAL - glogging - WORKER TIMEOUT (pid:50230) glogging.py:269 ERROR 2024-08-21 11:28:46 - ERROR - glogging - Worker (pid:50230) was sent code 134! glogging.py:272
while the UI show error message: Error Building Component Component Ollama has not been built yet
or Error Building Component Component CSV Loader has not been built yet
increasing the langflow run --timeout setting didn't solve the problemshows
Reproduction
LLM Iteration error recreation.json upload a CSV with over 20 rows, and a long paragraph in each row set the Iteration Count to a large number or 0 to go through all row save the output CSV file to a path
Expected behavior
The flow can Iterate through a large amount of rows without errors appear
Who can help?
No response
Operating System
Ubuntu 22.04.4 LTS
Langflow Version
1.0.5
Python Version
3.10
Screenshot
No response
Flow File
LLM Iteration error recreation.json