microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.44k stars 256 forks source link

I am getting "NameError: name 'overall_cls' is not defined" error when I run python raw2read.py #80

Closed rajvadiyala closed 9 months ago

rajvadiyala commented 9 months ago

Hello all, when I run python raw2read.py I am getting "NameError: name 'overall_cls' error. Here I am providing part log. Help me in fixing in this issue.

PS C:\Users\rajas\Desktop\AI_Research\LMOps-main\LMOps-main\adaptllm> python raw2read.py max_workers: 12 loading raw texts in the input folder... paths: ['./data_samples/input-raw-texts\0.txt', './data_samples/input-raw-texts\1.txt', './data_samples/input-raw-texts\10.txt', './data_samples/input-raw-texts\11.txt', './data_samples/input-raw-texts\2.txt', './data_samples/input-raw-texts\3.txt', './data_samples/input-raw-texts\4.txt', './data_samples/input-raw-texts\5.txt', './data_samples/input-raw-texts\6.txt', './data_samples/input-raw-texts\7.txt', './data_samples/input-raw-texts\8.txt', './data_samples/input-raw-texts\9.txt'] 12it [00:00, ?it/s] transferring raw texts into reading comprehension... 0%| | 0/12 [00:00<?, ?it/s] concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 256, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 205, in _process_chunk return [fn(args) for args in chunk] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 205, in return [fn(*args) for args in chunk] ^^^^^^^^^ File "C:\Users\rajas\Desktop\AI_Research\LMOps-main\LMOps-main\adaptllm\raw2read.py", line 19, in search context_wo_title = overall_cls.truncate_sentence(context_wo_title, max_len=overall_cls.max_seq_len-200) ^^^^^^^^^^^ NameError: name 'overall_cls' is not defined

Thanks in advance

cdxeve commented 9 months ago

Hi, thanks for your info. I rerun the code but do not encounter the same error, you may diff your "raw2read.py" with the one in our repo to see if there are any changes, or you may try to upgrade your "concurrent" packages?

rajvadiyala commented 9 months ago

I git cloned the repo and used VS code. Python Version is 3.11.

Here is the troubleshooting I did>

I am trying to run a python code and I am getting this error: The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\rajas\Desktop\LMOps-main\adaptllm\raw2read.py", line 88, in read_compre =list(process_map(search, raw_texts, max_workers=max_workers, chunksize=8192)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\contrib\concurrent.py", line 105, in process_map return _executor_map(ProcessPoolExecutor, fn, *iterables, *tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\contrib\concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, iterables, chunksize=chunksize), **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\std.py", line 1182, in iter for obj in iterable: File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 602, in _chain_from_iterable_of_lists for element in iterable: File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 401, in get_result raise self._exception NameError: name 'overall_cls' is not defined ChatGPT The error you're encountering is a NameError which indicates that a name (in this case, overall_cls) is not recognized in the current scope of your code.

Here's the relevant part of the traceback:

csharp Copy code NameError: name 'overall_cls' is not defined This suggests that somewhere in your code (or potentially in a function or method you're calling), you are trying to reference a variable or function named overall_cls, but it hasn't been defined.

To resolve this issue:

Check the spelling: Ensure that you've spelled the variable/function name correctly everywhere it's used. Check the scope: Ensure that the variable/function overall_cls is defined and is accessible in the scope where you're trying to use it. Import missing modules: If overall_cls is part of an external module, ensure that you've imported the module correctly. If you can provide the relevant sections of your code, especially the parts that reference overall_cls, I can provide more specific guidance.

Will you be kind enough to look into the folder that has the above code and tell me what is going wrong and suggest me steps to fix the issue? ChatGPT Of course! Please upload the folder or the relevant code files, and I'll take a look to help you identify and address the issue.

comparison.png Image raw2read.py Python README.md File requirements.txt Document ChatGPT Thank you for providing the files. I'll start by examining the raw2read.py file since it's the one mentioned in the traceback. I'll look for any references to overall_cls and try to understand the context in which it's being used to diagnose the issue.

Finished working Show work From the initial part of the raw2read.py file, I see that the overall_cls variable is referenced within the search function:

python Copy code

truncate the context to meet the max_seq_len

context_wo_title = overall_cls.truncate_sentence(context_wo_title, max_len=overall_cls.max_seq_len-200) It seems that overall_cls is expected to be an object that provides methods and properties related to text truncation. However, within the scope of the search function, overall_cls is not defined.

To fully diagnose the issue, I need to check if overall_cls is defined elsewhere in the file or if it's supposed to be passed as an argument to the search function or accessed in some other way.

Let's continue analyzing the file to understand how overall_cls is intended to be used.

Finished working Show work The variable overall_cls is indeed defined later in the code:

python Copy code overall_cls = type_map.cls_dic['overall']() overall_cls.init_spm(args.general_spm_path, args.domain_spm_path) However, the search function, which tries to access overall_cls, does not have visibility of this variable since it's defined outside of its scope. The function attempts to use it without it being passed as an argument or defined within its local scope, leading to the NameError.

To fix the issue, you have a few options:

Pass overall_cls as an Argument: Modify the search function definition to accept overall_cls as an argument and pass it when calling the function. Use a Global Variable: If you intend overall_cls to be a global variable that should be accessible throughout the script, you can declare it as global inside the search function. However, this approach is generally discouraged due to potential side effects and reduced code clarity. Given the structure of your code, I'd recommend the first approach: passing overall_cls as an argument to the search function. Would you like me to demonstrate how to make this change?

Give me modified code to paste or save ChatGPT Sure! I'll provide a modified version of the search function to accept overall_cls as an argument, and I'll also modify the call to process_map to pass overall_cls to the search function. Here's the modified code:

Modify the search function definition: python Copy code def search(entry, overall_cls): ... Modify the call to process_map: python Copy code read_compre = list(process_map(lambda entry: search(entry, overall_cls), raw_texts, max_workers=max_workers, chunksize=8192)) Here's the modified version of the relevant parts of your code:

Finished working Show work I've made the necessary modifications. You can download the modified raw2read.py file here.

Replace your original file with this modified version, and the NameError should be resolved.

max_workers: 12 loading raw texts in the input folder... paths: ['./data_samples/input-raw-texts\0.txt', './data_samples/input-raw-texts\1.txt', './data_samples/input-raw-texts\10.txt', './data_samples/input-raw-texts\11.txt', './data_samples/input-raw-texts\2.txt', './data_samples/input-raw-texts\3.txt', './data_samples/input-raw-texts\4.txt', './data_samples/input-raw-texts\5.txt', './data_samples/input-raw-texts\6.txt', './data_samples/input-raw-texts\7.txt', './data_samples/input-raw-texts\8.txt', './data_samples/input-raw-texts\9.txt'] 12it [00:00, 11567.83it/s] transferring raw texts into reading comprehension... 0%| | 0/12 [00:00<?, ?it/s] concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function at 0x0000021EA573A200>: attribute lookup on main failed """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\rajas\Desktop\LMOps-main\adaptllm\raw2read.py", line 88, in read_compre = list(process_map(lambda entry: search(entry, overall_cls), raw_texts, max_workers=max_workers, chunksize=8192)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\contrib\concurrent.py", line 105, in process_map return _executor_map(ProcessPoolExecutor, fn, *iterables, *tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\contrib\concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, iterables, chunksize=chunksize), **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\std.py", line 1182, in iter for obj in iterable: File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 602, in _chain_from_iterable_of_lists for element in iterable: File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 401, in get_result raise self._exception File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function at 0x0000021EA573A200>: attribute lookup on main failed ChatGPT The new error is related to the use of the lambda function with the ProcessPoolExecutor. The issue arises because the lambda function cannot be pickled (i.e., serialized) for the inter-process communication that occurs with ProcessPoolExecutor.

To address this, you can define a new function that wraps around the search function with the required arguments, rather than using a lambda function.

Here's what we can do:

Define a new function called search_wrapper that takes just the entry argument and calls search(entry, overall_cls) inside. Replace the lambda in the process_map call with this search_wrapper function. Let's make these modifications:

Finished working Show work I've made the necessary modifications. You can download the updated raw2read.py file here.

Replace your original file with this updated version, and it should resolve the pickling issue related to the lambda function.

max_workers: 12 loading raw texts in the input folder... paths: ['./data_samples/input-raw-texts\0.txt', './data_samples/input-raw-texts\1.txt', './data_samples/input-raw-texts\10.txt', './data_samples/input-raw-texts\11.txt', './data_samples/input-raw-texts\2.txt', './data_samples/input-raw-texts\3.txt', './data_samples/input-raw-texts\4.txt', './data_samples/input-raw-texts\5.txt', './data_samples/input-raw-texts\6.txt', './data_samples/input-raw-texts\7.txt', './data_samples/input-raw-texts\8.txt', './data_samples/input-raw-texts\9.txt'] 12it [00:00, 7837.38it/s] transferring raw texts into reading comprehension... 0%| | 0/12 [00:00<?, ?it/s] concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 256, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 205, in _process_chunk return [fn(args) for args in chunk] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 205, in return [fn(*args) for args in chunk] ^^^^^^^^^ File "C:\Users\rajas\Desktop\LMOps-main\adaptllm\raw2read_modified_v2.py", line 12, in search_wrapper return search(entry, overall_cls) ^^^^^^^^^^^ NameError: name 'overall_cls' is not defined """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\rajas\Desktop\LMOps-main\adaptllm\raw2read_modified_v2.py", line 93, in read_compre = list(process_map(search_wrapper, raw_texts, max_workers=max_workers, chunksize=8192)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\contrib\concurrent.py", line 105, in process_map return _executor_map(ProcessPoolExecutor, fn, *iterables, *tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\contrib\concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, iterables, chunksize=chunksize), **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\std.py", line 1182, in iter for obj in iterable: File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 602, in _chain_from_iterable_of_lists for element in iterable: File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rajas\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures_base.py", line 401, in get_result raise self._exception NameError: name 'overall_cls' is not defined ChatGPT I apologize for the oversight. The problem persists because overall_cls is not accessible in the separate processes spawned by ProcessPoolExecutor.

To resolve this, we need to pass overall_cls as part of the data being mapped over. One way to do this is to modify the data passed to process_map to include both the entry and overall_cls for each item.

Here's the plan:

Instead of passing raw_texts directly to process_map, we'll pass a list of tuples, where each tuple contains an entry and overall_cls. Modify the search_wrapper function to unpack the tuple and call search with the appropriate arguments. Let's make these modifications.

Finished working Show work I've made the further necessary modifications. You can download the newest version of raw2read.py here.

Replace your original file with this latest version, and it should address the issue with the overall_cls not being accessible in the worker processes.

File "C:\Users\rajas\Desktop\LMOps-main\adaptllm\raw2read_modified_v3.py", line 98 print('saving reading comprehension texts...') IndentationError: unexpected indent ChatGPT I apologize for the oversight. The error message indicates that there's an indentation issue in the code, specifically around the line that says print('saving reading comprehension texts...').

Let's correct this and provide you with the updated file.

Finished working Show work I've corrected the indentation issue. You can download the updated raw2read.py file here.

Replace your original file with this latest version, and it should resolve the indentation error.

File "C:\Users\rajas\Desktop\LMOps-main\adaptllm\raw2read_modified_v4.py", line 100 for entry in read_compre: IndentationError: unexpected indent ChatGPT I apologize for the inconvenience. Let's address this additional indentation issue.

I will correct the indentation for the line for entry in read_compre: and any subsequent lines that may be affected.

Finished working Show work I've corrected the additional indentation issue. You can download the latest version of raw2read.py here.

Replace your original file with this version, and it should address the IndentationError.

you cant do it. same error ChatGPT I apologize for the inconvenience. Let's fix this issue once and for all.

To ensure I address the problem correctly, I will explicitly look at the section of code around the line causing the error, adjust the indentation, and share that portion with you for confirmation before creating the entire file.

Finished working Show work The displayed code section seems to have inconsistent indentation. Specifically, the lines following the print('saving reading comprehension texts...') statement are indented more than they should be.

Here's the corrected section:

python Copy code read_compre = list(process_map(search_wrapper, raw_texts_with_overall_cls, max_workers=max_workers, chunksize=8192))

print('saving reading comprehension texts...')

sort by text_id to align with the order of raw texts

for entry in read_compre: path = os.path.join(args.output_dir, entry["file_name"]) with open(path, 'w') as f: f.write(entry['read_compre']) f.close()

print(f'saved to {args.output_dir}') I'll replace this corrected section in the entire file and provide you with the updated file.

Finished working Show work I've made the necessary corrections to the indentation. You can download the updated raw2read.py file here.

Replace your original file with this latest version, and it should resolve the IndentationError.

I am trying to do like shown in this page: https://github.com/microsoft/LMOps/tree/main/adaptllm And getting the above errors. Please tell me what exactly I should do. I went to adaptllm folder and ran requirements.tx and tyring to execute raw2ready.py file and started getting errors ChatGPT Thank you for sharing the context. I understand the frustration, and I'll guide you through the process step by step.

The repository you've shared is Microsoft's Language Model Operations (LMOps) framework for adapting and deploying large language models. The errors you're facing arise from the raw2read.py script, which converts raw text to a reading comprehension format.

From the details you've provided and the code fixes we've worked on, here's a recommended approach:

Dependencies: Make sure all the dependencies are correctly installed. Navigate to the adaptllm folder and run: bash Copy code pip install -r requirements.txt Replace raw2read.py: Use the updated raw2read.py file I provided. Download it and replace the original raw2read.py in the adaptllm folder.

Run the Script: Once you've replaced the file, try running the script again:

bash Copy code python raw2read.py [your arguments here] Make sure to replace [your arguments here] with the appropriate arguments needed by the script.

Check Other Scripts: If you intend to run other scripts from the LMOps repository, it might be a good idea to check them for potential issues as well. Given that you encountered errors with raw2read.py, there might be other scripts that require updates or fixes.

Python Version: Ensure you're using a compatible Python version. Some code might behave differently across Python versions. The repository doesn't explicitly mention a version, but since your error logs showed Python 3.11, it might be worth trying with a more commonly used version like Python 3.8 or 3.9, in case there are compatibility issues.

GitHub Issues: If you continue to face challenges, consider checking the "Issues" section of the GitHub repository. Others might have encountered similar problems, and solutions or workarounds could be discussed there. If not, you can create a new issue detailing the problems you're facing.

Following these steps should help you progress with the LMOps framework. If you encounter further issues, please let me know, and I'll assist you.

Thanks in advance

cdxeve commented 9 months ago

Thanks for your reply, but I have some difficulties in understanding your info (it seems like a generated response from ChatGPT). Could you please specify your info?

shamanez commented 9 months ago

@cdxeve

I also get the same error.

context_wo_title = overall_cls.truncate_sentence(context_wo_title, max_len=overall_cls.max_seq_len-200)
                       ^^^^^^^^^^^
NameError: name 'overall_cls' is not defined

I am trying to execute the code with the given examples.

cdxeve commented 9 months ago

@shamanez Hi, what is your python version? Have you ever tried "pip install --upgrade tqdm"?

buaahsh commented 9 months ago

@rajvadiyala @shamanez We have fixed this problem by adding functools in process_map

shamanez commented 9 months ago

Nice yeah .. I guess the problem was with the func tools

On Thu, 26 Oct 2023 at 6:33 PM, Shaohan Huang @.***> wrote:

@rajvadiyala https://github.com/rajvadiyala @shamanez https://github.com/shamanez We have fixed this problem by adding functools in process_map

— Reply to this email directly, view it on GitHub https://github.com/microsoft/LMOps/issues/80#issuecomment-1780443311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGUNIG2OCHVXDVUN273YBHY27AVCNFSM6AAAAAA5NSTUM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBQGQ2DGMZRGE . You are receiving this because you were mentioned.Message ID: @.***>