sadikovi / spark-netflow

NetFlow data source for Spark SQL and DataFrames
Apache License 2.0
18 stars 11 forks source link

Add version 9 support #77

Open adrien-gthb opened 6 years ago

adrien-gthb commented 6 years ago

Hi @sadikovi, are you planning to add version 9 support anytime soon?

sadikovi commented 6 years ago

Hello @raulot-a! I have not thought about it, since I use version 5 only. But that does not mean we cannot add it!

Should be fairly straightforward to add new version as long as you have some samples of the version 9 files (I am not sure if flow-tools can generate one). One would need to make small number of changes in https://github.com/sadikovi/spark-netflow/tree/master/src/main/java/com/github/sadikovi/netflowlib/version and add format similar to version 7 https://github.com/sadikovi/spark-netflow/blob/master/src/main/scala/com/github/sadikovi/spark/netflow/version7/DefaultProvider.scala ... and it should work!

I do not have samples of version 9 files, so it may be difficult to test afterwards. If you want you could open PR with changes. Let me know what you think!

sadikovi commented 6 years ago

Yes, I will be adding support for v9 in couple of weeks, possibly this weekend.

adrien-gthb commented 6 years ago

Hello @sadikovi, thank you for your response!

That would have been my pleasure to contribute by adding version 9 support to the project but unfortunately I'm very busy at the moment. However I have version 9 files in my possession so if you need any help testing, please let me know. Also keep in mind version 9 is template-based in order to be more flexible. As I haven't dived into the code (yet), I'm not sure if this could potentially cause a problem.

Yes, I will be adding support for v9 in couple of weeks, possibly this weekend.

That would be great!

sadikovi commented 6 years ago

Could you attach some sample files to this issue? It would definitely help. Thanks.

adrien-gthb commented 6 years ago

Here are some sample files containing randomly generated data: netflow_v9_samples.zip

Hope it helps!

sadikovi commented 6 years ago

I get Skip unknown record type 10 when reading any file from the archive.

sadikovi commented 6 years ago

Sorry, I have not started working on this, I am currently having problems getting/generating netflow version 9 files that actually work with nfdump, so I could use them for testing. The archive files give me Skip unknown record type 10 message; it is possible that I was using wrong command to read the files.

Once I have them - will update the code. I will start without schema evolution/merge support (you can only read files that have the same schema, which should cover most of the cases), but we might add it in the future.

adrien-gthb commented 6 years ago

What command and version of NfDump are you using?

Once I have them - will update the code. I will start without schema evolution/merge support (you can only read files that have the same schema, which should cover most of the cases), but we might add it in the future.

Sounds good to me.

sadikovi commented 6 years ago

I do not think it is version 9 of Cisco NetFlow, it looks like it is nfdump version of file.

czivar commented 3 years ago

Hey @sadikovi @raulot-a

Have you made progress on the V9/V10 support?

sadikovi commented 3 years ago

No, I have not made any progress on this; to be honest, I have not looked at v9/v10 support for quite some time as you can tell. Should not be very difficult to add though.

czivar commented 3 years ago

Ok, I am looking into adding it.

NickGoodfella commented 3 years ago

Ok, I am looking into adding it.

Any update on adding V9 support? I'm interested in that.

sadikovi commented 3 years ago

I am not sure if @czivar is working on it (if you do, please reply in the comments) but the main blocker is having sample files for testing, IMHO. If someone could provide those files, it would much easier to do the development and testing - I could take a look into this as well.