nanovms / ops

ops - build and run nanos unikernels
https://ops.city
MIT License
1.28k stars 132 forks source link

cannot map dir with python tensorflow #477

Open ssttevee opened 4 years ago

ssttevee commented 4 years ago

I am unable to package any python program that includes tensorflow :(

Environment:

ubuntu 18.04
python: 3.6.7
ops 0.1.9
nanos 0.1.23

Minimal reproduction steps:

python -m virtualenv --always-copy venv
source venv/bin/activate
python -m pip install tensorflow
ops load python_3.6.7 -c config.json

Contents of config.json:

{
    "MapDirs": {"venv/lib/*": "/.local/lib"},
    "Args": ["-m", "site"]
}

Resulting error:

2020/03/23 21:01:29 mkfs:metadata ... (nasty 365k character long line)
log full

panic: exit status 2

goroutine 1 [running]:
github.com/nanovms/ops/cmd.loadCommandHandler(0xc00025e280, 0xc0005df0e0, 0x1, 0x3)
    /Users/eyberg/go/src/github.com/nanovms/ops/cmd/load.go:192 +0x1eca
github.com/spf13/cobra.(*Command).execute(0xc00025e280, 0xc0005df050, 0x3, 0x3, 0xc00025e280, 0xc0005df050)
    /Users/eyberg/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x2ae
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001c6a00, 0x1175fe1, 0xc000379f88, 0xc000094058)
    /Users/eyberg/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2bb
github.com/spf13/cobra.(*Command).Execute(...)
    /Users/eyberg/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
    /Users/eyberg/go/src/github.com/nanovms/ops/ops.go:8 +0x28

Removing the MapDir property in config.json gives in the expected result:

[python3 -m site]
booting /home/steve/.ops/images/python3.img ...
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
assigned: 10.0.2.15
sys.path = [
    '/',
    '/usr/lib/python36.zip',
    '/usr/lib/python3.6',
    '/usr/lib/python3.6/lib-dynload',
]
USER_BASE: '/.local' (doesn't exist)
USER_SITE: '/.local/lib/python3.6/site-packages' (doesn't exist)
ENABLE_USER_SITE: True
exit status 1

PS: The ops/nanos experience with python is really lacking. The examples are too trivial and documentation is poor. I had to jump through many hoops before I figured out that I can just map my venv folder to /.local. I don't think I would've gotten this far if I had any less python experience. Also, libraries like libffi are probably worth bundling in the official pkg.

eyberg commented 4 years ago

hi, thanks for filing the issue

the 365k output you saw is the manifest which basically contains a fs layout of everything that was put in to venv - following those steps, for me, that's something like: 80 directories, 1799 files

that log full message is a known bug that is resolved in master - you can get around that by using 'ops load -n' - this will utilize the nightly build (builds from master)

let me know if this gets you further down the road

as for building other packages - that's definitely a key possibility - if you think it might be good for us to have an official tensorflow package happy to work with you on one for that as I totally agree the average user shouldn't be having to build the base pkg -- very loose instructions are in:

https://github.com/nanovms/ops/blob/master/PACKAGES.md

ssttevee commented 4 years ago

The -n flag seemed to do the trick, at least for ops load. My real goal was to get it running on gcp, but ops image create also suffers from the same issue and it doesn't seem like it takes -n.

I think it would probably be more worthwhile to add a mechanism that sniffs out linked libraries and automatically include them from the working environment. That would make creating packages and images much more streamlined for everyone, first-party or otherwise.

eyberg commented 4 years ago

hrm... that's pretty lame but definitely something we can add support for https://github.com/nanovms/ops/issues/478

eyberg commented 4 years ago

as for your other comment we already automatically include linked libraries ala ldd, but for things that aren't explicit yes they need to go into pkgs

ssttevee commented 4 years ago

Oh, I see that in the code now. From what I can tell, it looks like it only works for the main elf binary.

In this case, for example, numpy shared object, which requires libffi, is imported at runtime so it isn't included. Perhaps it should be expanded to include all files? or all files with a certain extention/regexp or in a certain folder? or at least add a new config option to declare files to sniff?

EDIT: I think that is something that I may be able to do, if pull requests are accepted.

eyberg commented 4 years ago

pull requests are definitely accepted, although in this case it sounds like a tensorflow package would be more appropriate - i don't know if a regex would cut it in this case - for what you are proposing you'd really want to dump the ast of the python script in question and from looking at debugging output even then the interpreter itself will look for shared libs in a half-dozen diff. places

we do this a lot for jvm based applications where we take a particular java package && then add in the framework or whatever on top

if you want to take stab at building a tensorflow pkg lmk otherwise I can try and whip one up - we have limited user created packages currently via ops load --local (load local package)

ssttevee commented 4 years ago

I was thinking a more along the lines of sniffing linked libraries of any .so included files.

I'll leave that to you because I don't think I'll be able to use nanos for my current project again. I need fork for parallelizing python but it's not implemented :(