tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.11k stars 287 forks source link

Problem when running preprocess.sh #70

Closed brash6 closed 4 years ago

brash6 commented 4 years ago

Hello,

First of all, thank you very much for this really interesting project. I have an issue when running preprocess.sh on the unprocessed java-small datasets that you provided. When I run it, I get this error :

$ bash preprocess.sh
Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...

inished extracting paths from training set
Creating histograms from the training data
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
File: my_dataset.test.raw.txt
Traceback (most recent call last):
  File "preprocess.py", line 134, in <module>
    max_contexts=int(args.max_contexts))
  File "preprocess.py", line 68, in process_file
    print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero

Thank you for your help

urialon commented 4 years ago

Hi, Thank you for your interest in this project!

  1. What is your TensorFlow version?
  2. What is your numpy version?
  3. As I understand, you are running on Windows?
  4. Do you have java installed? Please run java --version and copy the output.

Best, Uri

brash6 commented 4 years ago

Here are my configs :

I should have told you but I get the above error when running on Git Bash. Because when I'm running on classic command prompt I get this error :

E:\code2vec>bash preprocess.sh
Extracting paths from validation set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "JavaExtractor/extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir
    sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
Finished extracting paths from validation set
Extracting paths from test set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "JavaExtractor/extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir
    sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
Finished extracting paths from test set
Extracting paths from training set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "JavaExtractor/extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir
    sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
Finished extracting paths from training set
Creating histograms from the training data
Traceback (most recent call last):
  File "preprocess.py", line 3, in <module>
    import common
  File "/mnt/e/code2vec/common.py", line 2, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

I don't understand because I've tried several versions of jdk and checked it was added to my PATH but this error still occurs.

One more strange thing is in the preprocess.sh file, when I set PYTHON command to "python" instead of "python3", I get python command not found error. It's strange because in my prompt python3 command isn't recognized but python is.

urialon commented 4 years ago

Can you try to run the java process directly:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main
brash6 commented 4 years ago

It seems to work :

E:\code2vec>java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main
set|no|hash void,1726006538,METHOD_NAME void,-499095939,shasher void,1814237660,s void,-218942311,s METHOD_NAME,1218914318,shasher METHOD_NAME,-1789155605,s METHOD_NAME,-106192278,s shasher,1630270600,s shasher,-1589593235,s s,768576465,s
to|string string,362150388,METHOD_NAME string,270643696,string string,-1728163597,sss string,88316547,msource string,88316609,getname string,270643820,mhashedpath string,-922800703,mtarget string,-922800641,getname string,270643882,format METHOD_NAME,713917609,string METHOD_NAME,-825131430,sss METHOD_NAME,-549482244,msource METHOD_NAME,-549482182,getname METHOD_NAME,713917733,mhashedpath METHOD_NAME,-1560599494,mtarget METHOD_NAME,-1560599432,getname METHOD_NAME,713917795,format string,1041219767,sss sss,-65718818,msource sss,-65718756,getname sss,234225287,mhashedpath msource,380918261,getname msource,-1273411980,mhashedpath msource,-351481911,mtarget msource,-351481849,getname getname,-1288930698,mhashedpath getname,-1567635637,mtarget getname,-1567635575,getname mhashedpath,628859521,mtarget mhashedpath,628859583,getname mhashedpath,-870983830,format mtarget,-630198989,getname mtarget,-571812044,format getname,-587330762,format
get|path jsonignore,-1057165453,string jsonignore,-733851942,METHOD_NAME string,1387642418,METHOD_NAME string,774787451,mpath METHOD_NAME,263491700,mpath
get|source jsonignore,-1057165453,property jsonignore,-733851942,METHOD_NAME property,1387642418,METHOD_NAME property,774787451,msource METHOD_NAME,263491700,msource
etc.
urialon commented 4 years ago

Can you now try to run preprocess.sh where the directory JavaExtractor/JPredict/src/main is set to your train and test and val paths?

On Thu, 2 Apr 2020 at 0:58 brash6 notifications@github.com wrote:

It seems to work :

E:\code2vec>java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main set|no|hash void,1726006538,METHOD_NAME void,-499095939,shasher void,1814237660,s void,-218942311,s METHOD_NAME,1218914318,shasher METHOD_NAME,-1789155605,s METHOD_NAME,-106192278,s shasher,1630270600,s shasher,-1589593235,s s,768576465,s to|string string,362150388,METHOD_NAME string,270643696,string string,-1728163597,sss string,88316547,msource string,88316609,getname string,270643820,mhashedpath string,-922800703,mtarget string,-922800641,getname string,270643882,format METHOD_NAME,713917609,string METHOD_NAME,-825131430,sss METHOD_NAME,-549482244,msource METHOD_NAME,-549482182,getname METHOD_NAME,713917733,mhashedpath METHOD_NAME,-1560599494,mtarget METHOD_NAME,-1560599432,getname METHOD_NAME,713917795,format string,1041219767,sss sss,-65718818,msource sss,-65718756,getname sss,234225287,mhashedpath msource,380918261,getname msource,-1273411980,mhashedpath msource,-351481911,mtarget msource,-351481849,getname getname,-1288930698,mhashedpath getname,-1567635637,mtarget getname,-1567635575,getname mhashedpath,628859521,mtarget mhashedpath,628859583,getname mhashedpath,-870983830,format mtarget,-630198989,getname mtarget,-571812044,format getname,-587330762,format get|path jsonignore,-1057165453,string jsonignore,-733851942,METHOD_NAME string,1387642418,METHOD_NAME string,774787451,mpath METHOD_NAME,263491700,mpath get|source jsonignore,-1057165453,property jsonignore,-733851942,METHOD_NAME property,1387642418,METHOD_NAME property,774787451,msource METHOD_NAME,263491700,msource etc.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2vec/issues/70#issuecomment-607510308, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMAU46UY2FTDMD3IEPLRKO2GXANCNFSM4LZKEYHA .

brash6 commented 4 years ago

I once again have this error :

E:\code2vec>bash preprocess.sh
Extracting paths from validation set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "JavaExtractor/extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir
    sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
Finished extracting paths from validation set
Extracting paths from test set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "JavaExtractor/extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir
    sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
Finished extracting paths from test set
Extracting paths from training set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "JavaExtractor/extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir
    sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
Finished extracting paths from training set
Creating histograms from the training data
Traceback (most recent call last):
  File "preprocess.py", line 3, in <module>
    import common
  File "/mnt/e/code2vec/common.py", line 2, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'
urialon commented 4 years ago

Is this from your standard command line, which doesn't recognize Java and numpy? Is so, please try again from the other option, that does recognize them.

On Thu, 2 Apr 2020 at 11:51 brash6 notifications@github.com wrote:

I once again have this error :

E:\code2vec>bash preprocess.sh Extracting paths from validation set... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "JavaExtractor/extract.py", line 24, in ParallelExtractDir ExtractFeaturesForDir(args, dir, "") File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE) File "/usr/lib/python3.6/subprocess.py", line 709, in init restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "JavaExtractor/extract.py", line 98, in ExtractFeaturesForDirsList(args, to_extract) File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java' Finished extracting paths from validation set Extracting paths from test set... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "JavaExtractor/extract.py", line 24, in ParallelExtractDir ExtractFeaturesForDir(args, dir, "") File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE) File "/usr/lib/python3.6/subprocess.py", line 709, in init restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "JavaExtractor/extract.py", line 98, in ExtractFeaturesForDirsList(args, to_extract) File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java' Finished extracting paths from test set Extracting paths from training set... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "JavaExtractor/extract.py", line 24, in ParallelExtractDir ExtractFeaturesForDir(args, dir, "") File "JavaExtractor/extract.py", line 38, in ExtractFeaturesForDir sleeper = subprocess.Popen(command, stdout=outputFile, stderr=subprocess.PIPE) File "/usr/lib/python3.6/subprocess.py", line 709, in init restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "JavaExtractor/extract.py", line 98, in ExtractFeaturesForDirsList(args, to_extract) File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java' Finished extracting paths from training set Creating histograms from the training data Traceback (most recent call last): File "preprocess.py", line 3, in import common File "/mnt/e/code2vec/common.py", line 2, in import numpy as np ModuleNotFoundError: No module named 'numpy'

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2vec/issues/70#issuecomment-607711960, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXME7LX2HMYH3JICZVI3RKRGZBANCNFSM4LZKEYHA .

brash6 commented 4 years ago

Yes it was from my standard command line. I've tried on git bash and I get this error now :

$ bash preprocess.sh
Extracting paths from validation set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "E:\code2vec\JavaExtractor\extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "E:\code2vec\JavaExtractor\extract.py", line 37, in ExtractFeaturesForDir
    with open(outputFileName, 'a') as outputFile:
FileNotFoundError: [Errno 2] No such file or directory: 'main\\java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'main\\java'
Finished extracting paths from validation set
Extracting paths from test set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "E:\code2vec\JavaExtractor\extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "E:\code2vec\JavaExtractor\extract.py", line 37, in ExtractFeaturesForDir
    with open(outputFileName, 'a') as outputFile:
FileNotFoundError: [Errno 2] No such file or directory: 'main\\java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'main\\java'
Finished extracting paths from test set
Extracting paths from training set...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "E:\code2vec\JavaExtractor\extract.py", line 24, in ParallelExtractDir
    ExtractFeaturesForDir(args, dir, "")
  File "E:\code2vec\JavaExtractor\extract.py", line 37, in ExtractFeaturesForDir
    with open(outputFileName, 'a') as outputFile:
FileNotFoundError: [Errno 2] No such file or directory: 'main\\java'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "JavaExtractor/extract.py", line 98, in <module>
    ExtractFeaturesForDirsList(args, to_extract)
  File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList
    p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs))
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'main\\java'
Finished extracting paths from training set
Creating histograms from the training data
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
File: my_dataset.test.raw.txt
Traceback (most recent call last):
  File "preprocess.py", line 134, in <module>
    max_contexts=int(args.max_contexts))
  File "preprocess.py", line 68, in process_file
    print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero

Once again, when I'm on git bash, I have to set PYTHON to "python" and in the standard command line I have to set PYTHON to "python3". If not, I get python command not found error and I don't know why since python is recognized inside my standard command line. Thank you for your help,

urialon commented 4 years ago

This is something with Windows file paths and output redirection... By the way, how are you running a bash script on Windows?

Anyways, I recommend using an Ubuntu VM or use Windows 10's BashOnWindows.

On Thu, 2 Apr 2020 at 12:50 brash6 notifications@github.com wrote:

Yes it was from my standard command line. I've tried on git bash and I get this error now :

$ bash preprocess.sh Extracting paths from validation set... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "E:\code2vec\JavaExtractor\extract.py", line 24, in ParallelExtractDir ExtractFeaturesForDir(args, dir, "") File "E:\code2vec\JavaExtractor\extract.py", line 37, in ExtractFeaturesForDir with open(outputFileName, 'a') as outputFile: FileNotFoundError: [Errno 2] No such file or directory: 'main\java' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "JavaExtractor/extract.py", line 98, in ExtractFeaturesForDirsList(args, to_extract) File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs)) File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'main\java' Finished extracting paths from validation set Extracting paths from test set... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "E:\code2vec\JavaExtractor\extract.py", line 24, in ParallelExtractDir ExtractFeaturesForDir(args, dir, "") File "E:\code2vec\JavaExtractor\extract.py", line 37, in ExtractFeaturesForDir with open(outputFileName, 'a') as outputFile: FileNotFoundError: [Errno 2] No such file or directory: 'main\java' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "JavaExtractor/extract.py", line 98, in ExtractFeaturesForDirsList(args, to_extract) File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs)) File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'main\java' Finished extracting paths from test set Extracting paths from training set... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "E:\code2vec\JavaExtractor\extract.py", line 24, in ParallelExtractDir ExtractFeaturesForDir(args, dir, "") File "E:\code2vec\JavaExtractor\extract.py", line 37, in ExtractFeaturesForDir with open(outputFileName, 'a') as outputFile: FileNotFoundError: [Errno 2] No such file or directory: 'main\java' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "JavaExtractor/extract.py", line 98, in ExtractFeaturesForDirsList(args, to_extract) File "JavaExtractor/extract.py", line 69, in ExtractFeaturesForDirsList p.starmap(ParallelExtractDir, zip(itertools.repeat(args), dirs)) File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'main\java' Finished extracting paths from training set Creating histograms from the training data C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) File: my_dataset.test.raw.txt Traceback (most recent call last): File "preprocess.py", line 134, in max_contexts=int(args.max_contexts)) File "preprocess.py", line 68, in process_file print('Average total contexts: ' + str(float(sum_total) / total)) ZeroDivisionError: float division by zero

Once again, when I'm on git bash, I have to set PYTHON to "python" and in the standard command line I have to set PYTHON to "python3". If not, I get python command not found error and I don't know why since python is recognized inside my standard command line. Thank you for your help,

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2vec/issues/70#issuecomment-607741457, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMEX44GDBHUW3LSGNCTRKRNVJANCNFSM4LZKEYHA .

brash6 commented 4 years ago

Thank you for your help, I will try to run it on BashOnWindows. I'm running the script with : bash preprocess.sh

brash6 commented 4 years ago

I've tried again to run the script in my windows environment and I think I understood something. I've changed TEST, VAL and TRAIN to JavaExtractor/JPredict/src/main/ (by adding a'/' at the end) and I now get this error :

$ bash preprocess.sh
Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...
Finished extracting paths from training set
Creating histograms from the training data
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\HMA11\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
File: my_dataset.test.raw.txt
Traceback (most recent call last):
  File "preprocess.py", line 134, in <module>
    max_contexts=int(args.max_contexts))
  File "preprocess.py", line 68, in process_file
    print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero

It's the same error I got at the beginning and I think the problem comes from the JavaExtractor/extract.py step where it should create a ${DATASET_NAME}.train.raw.txt file. In my case, this file is created but it's empty, this could explain the ZeroDivisionError. Moreover, I have a non empty files which type are 'file' for each sub-directory of TRAIN, TEST and VAL directories.

Maybe it's the step that transforms these files into one ${DATASET_NAME}.train.raw.txt for TRAIN (for exemple) that does'nt work ?

urialon commented 4 years ago

Yeah, I think that something with file paths or output redirection doesn't work as expected on Windows... The code was written for linux.

brash6 commented 4 years ago

I'll try on linux, thank you for your help !

SimoneBrigante commented 4 years ago

Hi, I think I have the same issue. I have tried to run the preprocess.sh both on the java-small dataset and on a path-based representation of a python method using JetBrains-Research's PathMiner and in both cases I got the ZeroDivisionError. Python version: 3.6.9 Tensorflow version: 2.0.0 Java version: 11.0.6 I have tried both on macOS and an Ubuntu VM, here is the error for the second case:

$ source preprocess.sh Extracting paths from validation set... Finished extracting paths from validation set Extracting paths from test set... Finished extracting paths from test set Extracting paths from training set... Finished extracting paths from training set Creating histograms from the training data File: test_python.test.raw.txt Traceback (most recent call last): File "preprocess.py", line 135, in max_contexts=int(args.max_contexts)) File "preprocess.py", line 69, in process_file print('Average total contexts: ' + str(float(sum_total) / total)) ZeroDivisionError: float division by zero

urialon commented 4 years ago

Hi @SimoneBrigante , Can you try to run the java jar directly:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main
SimoneBrigante commented 4 years ago

Hi @SimoneBrigante , Can you try to run the java jar directly:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

Seem to work for me too:

$ java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main get|leaves node,912163849,METHOD_NAME node,995298373,leaves METHOD_NAME,-16858189,leaves get|name string,362150388,METHOD_NAME string,494437562,name METHOD_NAME,-16858189,name get|length long,600457869,METHOD_NAME long,-335025599,length METHOD_NAME,-16858189,length get|raw|type string,362150388,METHOD_NAME string,494437562,rawtype METHOD_NAME,-16858189,rawtype get|type string,362150388,METHOD_NAME string,494437562,type METHOD_NAME,-16858189,type get|name string,362150388,METHOD_NAME string,494437562,name METHOD_NAME,-16858189,name .........

Can you now try to run preprocess.sh where the directory JavaExtractor/JPredict/src/main is set to your train and test and val paths?

I have now tried to do what you suggested in this message, but by dooing so I get the same ZeroDivisionError

hsellik commented 4 years ago

@SimoneBrigante Quite a few times, I found the cause of the problem by looking at the error_log.txt, since this is where all the error messages get piped.

For this specific error, please make sure that the paths specified in preprocess.sh (TRAIN_DIR, VAL_DIR, TEST_DIR) are correct, otherwise extracting paths will seemingly "finish successfully" although the "...raw.txt" files are actually empty and that's what is causing the error. At least it was the problem in my case.

You can check the raw dataset file sizes by adding the following code to the preprocess.sh in the part before preprocess.py is called. This will help to rule out issues before preprocessing

echo "Train data raw file size:"
stat --printf="%s\n" ${TRAIN_DATA_FILE}
echo "Val data raw file size:"
stat --printf="%s\n" ${VAL_DATA_FILE}
echo "Test data raw file size:"
stat --printf="%s\n" ${TEST_DATA_FILE}
... > Hi, > I think I have the same issue. I have tried to run the `preprocess.sh` both on the java-small dataset and on a path-based representation of a python method using JetBrains-Research's PathMiner and in both cases I got the ZeroDivisionError. > Python version: 3.6.9 > Tensorflow version: 2.0.0 > Java version: 11.0.6 > I have tried both on macOS and an Ubuntu VM, here is the error for the second case: > > $ source preprocess.sh > Extracting paths from validation set... > Finished extracting paths from validation set > Extracting paths from test set... > Finished extracting paths from test set > Extracting paths from training set... > Finished extracting paths from training set > Creating histograms from the training data > File: test_python.test.raw.txt > Traceback (most recent call last): > File "preprocess.py", line 135, in > max_contexts=int(args.max_contexts)) > File "preprocess.py", line 69, in process_file > print('Average total contexts: ' + str(float(sum_total) / total)) > ZeroDivisionError: float division by zero
urialon commented 4 years ago

Thanks @hsellik 😀 @SimoneBrigante - can you please read what @hsellik suggested, and if it doesn't work - Can you try to run preprocess.sh where the directory JavaExtractor/JPredict/src/main is set to your train and test and val paths?

SimoneBrigante commented 4 years ago

Thanks to both @hsellik and @urialon. I wasn't able to find the error_log.txt file you mentioned, but I have been able to make the proprocess,sh work for java code. Now I have a few questions about extending it to python, but I will eventually open another issue. Thanks again

urialon commented 4 years ago

Great, I'm happy to hear!

ShaliniR11 commented 2 years ago

Hi @SimoneBrigante , I am facing the same issue as well. Can you please let me know how to resolve this error ?

Here's the output when I run the preprocess.sh script with TRAIN_DIR TEST_DIR VAL_DIR variables replaced with their actual file paths:

$ sh preprocess.sh preprocess.sh: line 21: C:/Users/shali/Documents/Git/code2vec/data/javadata/train/subtrain/: Is a directory preprocess.sh: line 22: C:/Users/shali/Documents/Git/code2vec/data/javadata/val/subval/: Is a directory preprocess.sh: line 23: C:/Users/shali/Documents/Git/code2vec/data/javadata/test/subtest/: Is a directory Extracting paths from validation set... Finished extracting paths from validation set Extracting paths from test set... Finished extracting paths from test set Extracting paths from training set... Finished extracting paths from training set Creating histograms from the training data Train data raw file size: 0 Val data raw file size: 0 Test data raw file size: 0 2022-07-10 21:58:45.695955: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-07-10 21:58:45.696225: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. File: my_dataset.test.raw.txt Traceback (most recent call last): File "C:\Users\shali\Documents\Git\code2vec\preprocess.py", line 133, in num_examples = process_file(file_path=data_file_path, data_file_role=data_role, dataset_name=args.output_name,
File "C:\Users\shali\Documents\Git\code2vec\preprocess.py", line 69, in process_file print('Average total contexts: ' + str(float(sum_total) / total)) ZeroDivisionError: float division by zero

urialon commented 2 years ago

Hi @ShaliniR11 , Can you please create a new issue, describe your system (Windows I guess?), describe what did you exactly do, and everything that you did try (including the other suggestions in this thread)?