Closed MarijnQ closed 3 weeks ago
Hey @MarijnQ, to clarify the behavior you're seeing: is the process hanging or is it exiting out? If it's exiting, do you happen to know what the exit code is?
@tgaddair so in Anaconda I have it all in code blocks. It just finishes the block and leaves it there. No exit code/ error code. I can launch a new block if I want and the machine works, but it Ludwig won't train
Thanks @MarijnQ. I wonder if there's an error message getting swallowed by the notebook. A couple things to try:
If neither of those work, I would try making sure our example scripts run, like the titanic example we have here to check if the error is specific to your dataset / model config.
One other thing I'll mention is that we just landed support for M1 acceleration with MPS. To try it out, make sure you have the master branch of Ludwig installed and set LUDWIG_ENABLE_MPS=1
in the environment.
@tgaddair Amazing, I'll try these tomorrow! 👍
@tgaddair
Alright, so when I set LUDWIG_ENABLE_MPS=1
I get the error ModuleNotFoundError: No module named 'mlflow'
.
On Jupyter Notebook I can't get log files apparently, haven't found a way to get those.
I tried running it in a mac terminal, made changes to the code so it should run, but it keeps popping up with syntax or indent errors. Is there another environment in which I could try it?
This is the current code I run now, not the one from the terminal:
!pip install 'ludwig[full]'--LUDWIG_ENABLE_MPS=1
!pip install torch -f https://download.pytorch.org/whl/cu113/torch_stable.html
import pandas as pd
from datetime import datetime as dt
import numpy as np
from pandas.core.base import value_counts
df2 = pd.read_csv("/Users/marijnquartel/Documents/Data/Industry Report/CompanyName_Industry.csv", index_col=0)
df2 = df2.replace({'industry': {'non-profit organization management': 'philanthropy', 'motion pictures and film':'entertainment', 'music':'entertainment','performing arts':'entertainment','law practice':'legal services','e-learning':'education','education management':'education','higher education':'education','media production':'entertainment','primary/secondary education':'education'}})
#Delete every category that has less than 500 entries
thresholdVal = 1000
df = df2[df2.groupby("industry")["industry"].transform('size')>=thresholdVal]
#Create a sample of the dataset in equal sizes based on industry
dataset = df.groupby('industry').apply(lambda x: x.sample(200,replace=True))
dataset.reset_index(drop=True, inplace=True)
dataset['industry'].replace('\s+', '_',regex=True,inplace=True)
dataset['industry'].replace('&', 'and',regex=True,inplace=True)
dataset['industry'].replace('/', '_or_',regex=True,inplace=True)
dataset['industry'].replace('-', '_',regex=True,inplace=True)
dataset['industry'].replace(',', '_',regex=True,inplace=True)
dataset.to_csv('CompanyAndIndust.csv')
model_definition="""
input_features:
-
name: name
type: text
level: word
encoder: parallel_cnn
output_features:
-
name: industry
type: text
"""
with open("model_definition.yaml", "w") as f:
f.write(model_definition)
!ludwig experiment \
--dataset CompanyAndIndust.csv\
--config model_definition.yaml
--dataset CompanyAndIndust.csv\
--config model_definition.yaml
I just tried running the rotten tomatoes set you guys have in the getting started section on the website.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Feb 14 09:15:23 2023
@author: marijnquartel
"""
!pip install ludwig
!pip install tensorflow
import pandas as pd
df = pd.read_csv('/Users/marijnquartel/Downloads/rotten_tomatoes.csv')
model_definition="""
input_features:
- name: genres
type: set
preprocessing:
tokenizer: comma
- name: content_rating
type: category
- name: top_critic
type: binary
- name: runtime
type: number
- name: review_content
type: text
encoder: embed
output_features:
- name: recommended
type: binary
"""
with open("model_definition.yaml", "w") as f:
f.write(model_definition)
from ludwig.api import LudwigModel
!ludwig experiment \
--dataset df\
--config model_definition.yaml
This raises the error
ModuleNotFoundError: No module named 'mlflow'
Here I am again. I have tried running Colab on a local runtime and I go this error message:
/Users/marijnquartel/opt/anaconda3/lib/python3.9/site-packages/torch/nn/modules/conv.py:309: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/Convolution.cpp:896.)
return F.conv1d(input, weight, bias, self.stride,
Got nothing when I enabled the MPS
When I run Ludwig in Anaconda, I get an experiment description, the data is preprocessed, and I see "model", "Warnings and other logs", however these stay empty.
I am running Python 3 (ipykernel) in Jupyter Notebook 6.4.8 on a Macbook Pro M1 running on Ventura 13.1 (22C65)
Whats going wrong here? It does run on Google Colab, (with PyTorch, tried this on the Jupyter Notebook, but no success either), I just want to train my data on my laptop making use of the M1 chip.
I'm running the following code:
I get no errors or anything, it just stops saying the following: