Open grorge123 opened 3 weeks ago
Would love to see a performance comparison of the same model on llama.cpp on Intel CPUs.
I run a simple test on i7-12700K. model: llama-2-7b-chat.Q4_0 Number of input tokens: 12
Does that mean it is actually slower than llama.cpp?
Yes, the current result shows neural speed is slower than llama.cpp. In addition, directly run neural speed also can not achieve llama.cpp runtime.
@grorge123 Could you check how many CPU cores are used in both benchmarks? According to the document, the neural speed should have a better performance on the Intel CPU than other runtimes.
Neural speed uses all the CPU cores(20).
I update the neural speed vision to 1.0. The new result is Load time: 4989ms Compute time: 96263ms
I test on another computer i7-10700. The other variables are the same.
In this case neural speed has a better performance than llama.cpp. But I have no idea why i7-10700 has better performance than i7-12700K.
Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.
Overall Summary: The GitHub Pull Request titled "Add neural speed example" introduces new functionality efficiently by adding new files, updating dependencies, implementing logic for using the Neural Speed plugin, and updating relevant documentation. However, there are several potential issues and areas for improvement identified in the individual summaries:
Potential Issues and Errors:
Most Important Findings:
Details
Commit 612c2c396653f0911f3ded717016627f41a9b51a
Key Changes:
wasmedge-neuralspeed
directory.Potential Problems:
/usr/local/bin
and/usr/local/lib
in the installation instructions may not be suitable for all systems and could lead to errors on different setups.Overall, the patch introduces new functionality efficiently but could be improved in terms of error handling and documentation.
Commit 14cb1941e1cde1feecdd7b70f5bd4cca6503e125
Key Changes:
Potential Problems:
The method call 'context.fini_single()' has been replaced with 'graph.unload()', but there is no direct explanation provided as to why this change was made. It's important to ensure that this change doesn't introduce any new issues or break existing functionality.
It seems like the comment at the end of the patch file is incomplete (line:
graph.unload().expect("Failed to free resource");
). It is advisable to provide a meaningful comment explaining the rationale behind this change.Overall, it's important to review the impact of replacing 'context.fini_single()' with 'graph.unload()' to ensure that it aligns with the project's design and functionality. The changes should also be properly documented for better understanding by other contributors.
Commit 75d677266bbf80900b0038f65de3261908334cca
Key Changes:
Potential Problems: