pluskid / Mocha.jl

Deep Learning framework for Julia
Other
1.29k stars 254 forks source link

test.txt file #177

Closed ashleylid closed 8 years ago

ashleylid commented 8 years ago

Hi,

This is more of a question than an issue. I have a list of hdf5 files, and taking the lead from the tutorial I list them in my test.txt file - but when I run the training I think its only seeing the one file rather than looping through all of them (and concatenating them somehow)

Should I be doing this outside of the txt file? Because putting together a 4d hdf5 file is proving to be tough.

Thx

EDIT: also a really basic question - but how do I access the obj_val at each iteration? to work out the RMSE.. Or is there an easier way to find the training error?

pluskid commented 8 years ago

It is supposed to be iterating through all files as shown here. If it is not working, there must be something wrong somewhere. Could you make a minimum example that could reproduce the bug?

ashleylid commented 8 years ago

Hi,

I guess the minumim example would be your mnist tutorial. I pretty much used that as my base.

data_layer  = HDF5DataLayer(name="train-data", source="data/train.txt",
    batch_size=64, shuffle=true)

etc...

Only my train.txt lists:

/input_data/train1.h5
/input_data/train2.h5
/input_data/train3.h5
/input_data/train4.h5

(with full paths)

When you do your training yours says: Accuracy (avg over 10000) = X.0000% Which is your full database

but when I do my training: Accuracy (avg over 200) = 39.0000% 200 is the value of data lines in each of my h5 files - so I was expecting to see 1000

BUT now that I look closer - I only have 200 in my testing data, so obviously its only going to say 200.. I think..

NOW that I have your attention - which I really really appreciate - I was wondering if you could help me with how to find the training error easily? Is there a method that I am missing?

I get the value for each predicted y - and I could load each of my h5 files to find the actual value. But I am guessing you have already gone to all this trouble there has to be a way already set up that spits out the training error.. And if I wanted to plot the training error against the test error - how do I access the values that are being printed by solver(solver,net) and the coffee breaks?

Thank you for all your time..

pluskid commented 8 years ago

If you set up a "coffee lounge" (see the doc here), the statistics will be saved automatically. You can load them later (see for example tools/plot_statistics.jl).

ashleylid commented 8 years ago

Thank you! knew I was missing something - will check it out and come close this if its all sorted.

EDIT: I am running the tools/plot_statistics.jl but getting an error:

$ julia plot_statistics.jl -i 3 statistics.h5 WARNING: int(s::AbstractString) is deprecated, use parse(Int,s) instead. in depwarn at deprecated.jl:73 in int at deprecated.jl:50 in map at ./abstractarray.jl:1305 [inlined code] from /home/ashley/.julia/v0.4/Mocha/tools/plot_statistics.jl:78 in anonymous at no file:0 in include at ./boot.jl:261 in include_from_node1 at ./loading.jl:304 in process_options at ./client.jl:280 in _start at ./client.jl:378 while loading /home/ashley/.julia/v0.4/Mocha/tools/plot_statistics.jl, in expression starting on line 77 WARNING: Using non-boolean collections with any(itr) is deprecated, use reduce(|, itr) instead. If you are using any(map(f, itr)) or any([f(x) for x in itr]), use any(f, itr) instead. in depwarn at deprecated.jl:73 in nonboolean_any at deprecated.jl:797 in any at reduce.jl:358 [inlined code] from /home/ashley/.julia/v0.4/Mocha/tools/plot_statistics.jl:79 in anonymous at no file:0 in include at ./boot.jl:261 in include_from_node1 at ./loading.jl:304 in process_options at ./client.jl:280 in _start at ./client.jl:378 while loading /home/ashley/.julia/v0.4/Mocha/tools/plot_statistics.jl, in expression starting on line 77 Hit to continue

Will try figure out whats up..

EDIT: managed to get it running by changing a few things. Starting in your script from > using PyPlot:


using PyPlot
# if parsed_args["idx"] != ""
    selected_ind = (split(parsed_args["idx"], ","))
    selected_ind = [parse(Int64, s) for s = selected_ind]

  # selected_ind = map(split(parsed_args["idx"], ","))
    # if any([x < 0 || x > length(numbered_names) for x in selected_ind])
    #   list_stats(numbered_names)
    #   error("Invalid index in your list : $selected_ind make sure the indices are between 1 and $(length(numbered_names))")
    # end

  figure()
  for ind in selected_ind
    # get the right stats file
    (stats_num, fname, selected) = numbered_names[ind]
    stats = all_stats[stats_num][selected]

    # do the actual plotting
    # x will simply be the iteration number
    #   which we will sort
    x = sort(collect(keys(stats)))
    # and y is the statistics corresponding to
    # the selected statistics you want to plot
    y = [stats[i] for i in x]
    plot(x, y, label="$(fname)/$(selected)")
    plt[:show]()
  end
  legend()

  print("Hit <enter> to continue")
  readline()
  close()
# end

# delete temporary file if it was created
if parsed_args["tmp"]
  for f in stats_files
    rm(f)
  end
end

Yes there is a lot commented out. But at least its working.

I think I am into a completely new issue. Please let me know if I should close this and open elsewhere.