No such file or directory: '/srv/www/serverSHARK/plugin_output/8/8_out.txt

iwonajs commented 4 years ago

I was able to successfully install serverSHARK and 2 plugins; vcsSHARK and issueSHARK. These 2 plugins were installed in the in the serverSHARK admin gui. They have been activated. I tried using them both and both give me an error: Exception Type: FileNotFoundError at /admin/smartshark/project/job/8/output Exception Value: [Errno 2] No such file or directory: '/srv/www/serverSHARK/plugin_output/8/8_out.txt'`

Note that I also installed issueSHARK as a command line tool and I am able to download the issue data.

Error output: `Environment:

Request Method: GET Request URL: http://192.168.1.115:8000/admin/smartshark/project/job/8/output

Django Version: 1.11.29 Python Version: 3.6.9 Installed Applications: ['smartshark.apps.ServersharkConfig', 'suit', 'django_filters', 'bootstrap3', 'progressbarupload', 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles'] Installed Middleware: ['django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.auth.middleware.SessionAuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware']

Traceback:

File "/srv/www/serverSHARK/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner

response = get_response(request)

File "/srv/www/serverSHARK/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response

response = self._get_response(request)

File "/srv/www/serverSHARK/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response

response = self.process_exception_by_middleware(e, request)

File "/srv/www/serverSHARK/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response

response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/srv/www/serverSHARK/smartshark/views/common.py" in job_output

output = interface.get_output_log(job)

File "/srv/www/serverSHARK/smartshark/datacollection/localqueueconnector.py" in get_output_log

return self._get_log_file(job, 'out')

File "/srv/www/serverSHARK/smartshark/datacollection/localqueueconnector.py" in _get_log_file

with open(os.path.join(plugin_execution_outputpath, str(job.pk) + '' + log_type + '.txt'), 'r') as f:

Exception Type: FileNotFoundError at /admin/smartshark/project/job/8/output Exception Value: [Errno 2] No such file or directory: '/srv/www/serverSHARK/plugin_output/8/8_out.txt'`

atrautsch commented 4 years ago

Is the worker process running? This could be a case of the serverSHARK admin gui process running but not the worker.

iwonajs commented 4 years ago

I created a clean install.
Added vcsSHARK plugin.
Added issueSHARK plugin.
Created a new project: spring-boot 5a. Started collecting commits data using vcsSHARK plugin 5b. Once the job was added, then I went and added some more cores for the job. So I modified the job.
navigated to project > plugin executions > show jobs > show output Error Message: FileNotFoundError at /admin/smartshark/project/job/1/output [Errno 2] No such file or directory: '/srv/www/serverSHARK/plugin_output/1/1_out.txt'
Then I got a screenshot of the terminal where I started the server: python manage.py runserver 0.0.0.0:8000

Next I refreshed the Dashboard, and noticed that when I refresh the dashboard no new files/commits are being added.
Then I checked the job status in serverSHARK to see if the status changed, but it was the same as from the beginning of the job.
for the git repository I was trying to extract, spring-boot I verified that there are 26,662 commits. I used: git log --pretty-oneline | wc -l But so far sartSHARK/csvSHARK got 2,299

iwonajs commented 4 years ago

Is the worker process running? This could be a case of the serverSHARK admin gui process running but not the worker.

Not sure if this would answer your question, see screenshot form yesterday, below. I should mention that yesterday I got serverSHARK running for the first time. Then I installed the plugin issueSHARK manually in the command line not noticing that I could have done that in the serverSHARK. Once I realized that, also tried to install in the serverSHARK, but by then I downloaded some data using the command line. I when encountered a problem with serverSHARK, that is it looked it no longer was downloading data, I tried to delete the data I already downloaded. I was able to delete some of the data in serverSHARK, but the data - for projects that I created outside I could not, so I manually purged some of the collections. This morning I started from FRESH. I reinstalled the serverSHARK from scratch. I added the plugins only via serverSHARK. I started one job only. It got some data about 3K out of 27K that I was expecting. It is no longer downloading the data.

The unique thing about my setup is that I created the serverSHARK virtual machine on hyper-v using hashicorp/bionic64 and the vm uses a virtual switch.

atrautsch commented 4 years ago

That last screenshot shows the worker which creates the previously missing folder. It seems though that the Job was deleted, maybe the PluginExecution was deleted, so the result could not be stored.

ServerSHARK consists of two components which have to be running at the same time. The admin GUI which is started via manage.py runserver .... and the worker process which asynchronously works its way through the jobs dropped into the queue by the admin GUI which is started via manage.py peon.

When I use the Vagrant file I usually connect two shells, start both and then start mining. The shell with the worker process shows verbose output which helps debugging while the admin GUI allows easy plugin installation and usage.

Do you have the ouput of the worker for the vcsSHARK run? It should be able to extract all commits except when the database runs out of storage space.

iwonajs commented 4 years ago

debug.log error.log info.log

Hello! Thank you for looking into this. I attached all 3 logs in lieu of the screenshot. Let me know if these logs suffice. Yes, I do use 2 windows, one runs the admin and the other runs the peon jobs.

Much appreciated.

atrautsch commented 4 years ago

Spring-boot is a bit on the large side, usually we would mine this over our HPC-Cluster. I have tried with the serverSHARK Vagrant and the memory required has to be increased for this size. What is happening is that the machine runs out of memory and kills the python processes. In a local test I increased the memory in the Vagrantfile from 2048 to 8192 and now it seems to be running. You could try this to get the VCS data. Edit: 8192 MB is still not enough, I am now testing with 14GB. Edit2: It seems like memory is not freed during collection, a workaround would be to assign as much memory as possible and perform multiple runs, vcsSHARK skips already collected data.

iwonajs commented 4 years ago

Thank you! I will try that. Much appreciated. I will let you know how that goes.

iwonajs commented 4 years ago

I setup dynamic memory on my hyper-v VM. I resumed the data extraction using only ONE CPU (the default) with vcs and so far so good.

The completeness I want to say that the error of "no such directory of file" is there even though the process is running. It seems that this is a separate issue.

I would like to speed up the process.

Option 1: Because of memory leak, do you advise against increasing the cpu? That is, will increasing the CPU cause a more likely issue of running out of RAM?

Option 2: Would downloading the repository for my project of interest (spring-boot) on my hard drive and then execute data extraction with vcs on the command line (that is not using serverSHARK interface thus the peon asynchronous task manager). Would this possibly be faster?

Thank you so much.

atrautsch commented 4 years ago

You could increase the CPUs as the MongoDB can then run on its own. The no such directory or file error will persist as the output and error files are created only after the job is completed and in this case it will only be created on the last successful run. I am in the process of changing this but I have to run additional tests.

The repository is downloaded locally so there is no speedup possible there. I think an additional CPU may increase the speed as then the python process and MongoDB would not have to share one CPU.

iwonajs commented 4 years ago

Thank you!

atrautsch commented 4 years ago

I updated the worker code. It now creates the folders and log files while executing the job and not after it is done.

yiikou commented 3 years ago

Hi @iwonajs , may I ask where does this view come from?

sherbold commented 3 years ago

That view is generated by the serverSHARK, when you do not use the admin page, i.e., YOUR_URI/visualizations/overview/ instead of YOUR_URI/admin/

smartshark / serverSHARK

No such file or directory: '/srv/www/serverSHARK/plugin_output/8/8_out.txt #38